Friday, May 28, 2021

Tampa Museum of Art joins Kerameikos + OpenRefine templates

The Tampa Museum of Art (TMA) has recently joined the Kerameikos project, supplying data and Creative Commons-licensed images for a dozen Attic vases that have been digitized so far as part of their new collections management system. These objects can be seen at the Kerameikos URI for the TMA.

Tampa Museum of Art
 

Importantly, this is the first collection normalized in OpenRefine and directly exported into the Linked Art CIDOC-CRM RDF/XML aggregation model. Previous collections from the British Museum and Fitzwilliam were reconciled to Kerameikos URIs in OpenRefine and then exported into CSV for external processing with PHP scripts. I'd like to get away from this bespoke scripting, and OpenRefine's export templates are more than adequate for generating RDF for import into the Kerameikos SPARQL endpoint.

I have added this template into Gist, and hopefully other projects can use them to do their own reconciliation and normalization, and provide RDF to us without me personally doing this work. In the longer term, we are aiming to harvest Linked Art JSON-LD directly, which I had previously prototyped in October 2019 with data from the Indianapolis Museum of Art.

First, you can see the TMA spreadsheet, post-reconciliation, here.

The forNonBlank GREL statement enables including properties or nodes only if a URI is present in the spreadsheet:

{{forNonBlank(cells["Shape URI"], c, '<kon:hasShape rdf:resource="' + c.value + '"/>', "")}}

 Where kon:hasShape is a subproperty of crm:P2_has_type, but otherwise the Kerameikos data model follows the Linked Art profile pretty precisely. Concept URIs should be Kerameikos ones. The Linked Art JSON-LD harvester normalizes Getty and others to Kerameikos, when they are linked via skos:exactMatch.

Findspot URIs should be reconciled to Wikidata places:


{{forNonBlank(cells["Findspot URI"], c, '<crmsci:O19i_was_object_found_by>
    <crmsci:S19_Encounter_Event>
        <crm:P7_took_place_at>
            <crm:E53_Place>
                <rdfs:label xml:lang="en">' + cells["Findspot"].value + '</rdfs:label>
                <crm:P89_falls_within rdf:resource="' + c.value + '"/>                        
            </crm:E53_Place>
        </crm:P7_took_place_at>
    </crmsci:S19_Encounter_Event>
</crmsci:O19i_was_object_found_by>', "")}}

Measurements are expressed by using Getty AAT URIs for the measurement type (e.g., height, width, etc.) and unit (cm, mm, etc.). Below illustrates rendering a centimeter height measurement from the spreadsheet into RDF:

{{forNonBlank(cells["Height (cm)"], c, '<crm:P43_has_dimension>
    <crm:E54_Dimension>
        <crm:P90_has_value rdf:datatype="http://www.w3.org/2001/XMLSchema#decimal">' + c.value + '</crm:P90_has_value>
        <crm:P2_has_type rdf:resource="http://vocab.getty.edu/aat/300055644"/>
        <crm:P91_has_unit rdf:resource="http://vocab.getty.edu/aat/300379098"/>
    </crm:E54_Dimension>
</crm:P43_has_dimension>', "")}}

Presently (until the TMA sorts out a server issue with their IIIF image info.json now being accessible), the TMA images are jpeg files formed by using the image API to get an 800 pixel wide response, but the model for representing IIIF services and manifests can be found at https://linked.art/model/digital/#iiif.

No comments:

Post a Comment