The bulk of my project work over the last several months has focused on preparing artifact data for the Historic Fort Snelling building (the Short Barracks) that I plan to publish in OpenContext. Project staff and volunteers have steadily been creating inventory records while combing through the boxes and boxes of collections in order to provide a consistent and well-structured (of course!) index linking the objects to important things like which of the 600+ boxes they are stored in.

We use controlled vocabulary terms to characterize the collections. I continue to be pleased with how well the Getty’s Art and Architecture Thesaurus covers the types of artifacts we need to describe, but we have added local vocabulary terms as needed for things like “ox shoes” to distinguish them from “horse shoes” for example.

I am also trying to capture the original catalog descriptions inside these basic inventory records. Many of the project volunteers have been steadily transcribing the old paper forms so that they can be added to the records created through inventory. In fact, I have already married transcriptions with the inventory data for three small buildings at the site. Working with the original 1970s catalogs for the Short Barracks has turned out to be much harder than any of the other building catalogs though. I quickly realized the “A” team had not been assigned to this cataloging. Let’s just say someone had a lot of trouble numbering artifacts correctly, among other things. For the 6650 descriptions matched, I annotated 490 records to explain the problems such as, “The artifact labeled 317.55.1 does not match the catalog description for that number. It matches the description for 317.56.1.”

Annotation has been on my mind since the beginning, so it’s interesting that the dataset I chose needed so much of it right out of the gate. Any good researcher will want to add data, right? Misidentifications along with errors like the ones I found are practically guaranteed. My partner, Dr. Kathryn Hayes, will undoubtedly need to add descriptive information as well. GitHub seems like the perfect tool for version control of datasets, but I would like to explore various ways to approach this issue.

After some trial and error, I have managed to generate an acceptable XML export report to extract the Short Barracks data from MNHS collections management database. Technically, it is ready for Eric Kansa to review. I have experimented a bit with Open Refine to look at the data in CSV form and found some elements are still not structured as I would like. I need to learn more about Open Refine conversion mapping. In fact, I need to learn a lot more about using Open Refine in general.

Ensuring that I push all aspects of my project forward is challenging (because working with data is my first love). I gave a presentation at the Midwest Historic Archaeology Conference in Minneapolis this fall to raise awareness of my work. I also met with my University of Minnesota partner Kathryn Hayes and MNHS Fort Snelling Program Specialist Matt Cassady. I plan a second round of meetings in the next month to discuss the direction of the public facing interpretive components.