Thursday, December 9, 2021

Using Wikidata APIs to regularize findspots

Combining attributes of two different pipelines, I have made an substantial update to the RDF ingestion process in the Kerameikos.org XForms back-end. As previously discussed in the development of a Linked Art JSON-LD harvester in fall 2019, findspot gazetteer URIs that match the Getty Thesaurus of Geographic Names, the UK's Ordnance Survey, and Geonames.org are reconciled to Wikidata URIs. A SPARQL query is then issued to the Wikidata endpoint to extract the coordinates, feature type/class, and parent geographic entity, if applicable.

CONSTRUCT {
  ?place a skos:Concept; 
  		   rdfs:label ?placeLabel;
           skos:closeMatch ?osgeo;
           skos:closeMatch ?tgn;
           skos:closeMatch ?geonames ;
           skos:closeMatch ?pleiades ; 
           skos:broader ?parent ;
           dct:coverage ?coord ;
           dct:type ?type .
}
WHERE {
  ?place wdt:P1667 "7015539" . #TGN ID for Vulci .
  OPTIONAL {?place wdt:P3120 ?osgeoid .
  	BIND (uri(concat("http://data.ordnancesurvey.co.uk/id/", ?osgeoid)) as ?osgeo)}
  OPTIONAL {?place wdt:P1667 ?tgnid .
  	BIND (uri(concat("http://vocab.getty.edu/tgn/", ?tgnid)) as ?tgn)}
  OPTIONAL {?place wdt:P1566 ?geonamesid .
  	BIND (uri(concat("https://sws.geonames.org/", ?geonamesid, "/")) as ?geonames)}
  OPTIONAL {?place wdt:P1584 ?pleiadesid .
  	BIND (uri(concat("https://pleiades.stoa.org/places/", ?pleiadesid)) as ?pleiades)}
  OPTIONAL {?place p:P625/ps:P625 ?coord}
  OPTIONAL {?place wdt:P131 ?parent}
  OPTIONAL {?place wdt:P31/wdt:P279+ ?type . FILTER (?type = wd:Q486972)} #is human settlement
  OPTIONAL {?place wdt:P31 ?type FILTER (?type = wd:Q839954)} #archaeological site
  SERVICE wikibase:label {
	bd:serviceParam wikibase:language "en"
  }
}

An iterative process generates RDF for each place (crm:E53_Place) and spatial feature (dually crmgeo:SP5_Geometric_Place_Expression and geo:SpatialThing to be compatible with both CIDOC-CRM and the WGS84 ontology) and its parent region. Spatial features are only attached to a place if it is a human settlement or archaeological site (so no coordinates that represent the central point of a region or nation).

This workflow had applied only to Linked Art JSON-LD ingestion, which had been prototyped with a handful of vases from the Indianapolis Museum of Art at Newfields. Subsequently, we have ingested several other collections, where CSV or JSON exports were loaded into OpenRefine for further reconciliation and exported into the CIDOC-CRM model through OpenRefine's templating system. Prior to the implementation of the template system for the Tampa Museum of Art, I had written a PHP script to turn the British Museum's CSV export from OpenRefine (following my own cleanup) into RDF, and the script performed the Wikidata SPARQL lookups illustrated above in order to incorporate the place RDF hierarchy directly in the RDF/XML file with the BM's objects, which I uploaded into the Kerameikos.org SPARQL endpoint. I had also applied this workflow to the the Getty collection.

Now that the Wikidata reconciliation and SPARQL-based lookups have been integrated directly into the RDF ingestion system in the Kerameikos.org XForms engine, I have eliminated any need for creating bespoke PHP scripts to perform findspot hierarchy lookups for any collection that we integrate into the project.


Essentially, museums can either provide Linked Art JSON-LD for harvesting (if the JSON-LD includes the necessary Kerameikos or Getty URIs) or any spreadsheet can be cleaned up in OpenRefine (with findspots reconciled directly to Wikidata URIs) and exported directly into RDF/XML following the templating principles outlined above. The Kerameikos.org ingestion workflow will fill in any gaps in findspot coverage and geographic hierarchy without further software intervention. This is a significant advancement in the sustainability of our data integration workflow and allows us to fully standardize the data model for findspot places.

I plan to implement these updates into the Nomisma.org ingestion engine next.

Thursday, December 2, 2021

Aligning Kerameikos.org more directly with CIDOC-CRM

When the Kerameikos.org project was founded in 2013, our intent was for the LOD thesaurus system to be modeled primarily in SKOS, with instances in certain categories to be designated subject-specific RDF classes in our own ontology (e.g., kon:Shape) or classes in existing ontologies (for example, foaf:Person and foaf:Group).

Our thesaurus is still built around SKOS, but since we have aligned our vase aggregation RDF model with Linked Art (a community-built CIDOC-CRM profile serialized as JSON-LD), I have subsequently made some alterations to the classes we use for concept URIs and updated our ontology.

These changes affect the RDF concepts themselves, but also I've searched and replaced classes throughout the Kerameikos codebase as well.

  • foaf:Person has been replaced with crm:E21_Person
  • foaf:Group has been replaced with crm:E74_Group
  • kon:ProductionPlace has been replaced with crm:E53_Place and kon:ProductionPlace has been deprecated from the Kerameikos ontology.
    • Spatial expressions are dually compatible with both CIDOC-CRM and the WGS84 ontology in that the E53:Place concept includes both geo:location and crm:P168_place_is_defined_by properties linking to the same node URI, which carries both the geo:SpatialThing and crmgeo:SP5_Geometric_Place_Expression classes. These spatial features may include geo:lat and geo:long (for points) or osgeo:asGeoJSON as before, but now include the crmgeo:asWKT property with a datatype of http://www.opengis.net/ont/geosparql#wktLiteral, which should make these points and polygons compatible with endpoints that support the GeoSPARQL protocol. See the machine-readable data underlying http://kerameikos.org/id/athens, for example.

The Kerameikos.org ontology page has been significantly revised to make it more transparent than before, in line with improvements we have made to the Nomisma page in recent years. The ontology URI now supports content negotiation to request RDF/XML or Turtle as alternatives with the Accept header and relevant mime-types. We have also implemented ontology versions, so that you can compare the 2015 edition with the current 2021 revision.

The ontology has been tightened up with better definitions of our few custom ceramic-oriented RDF classes (Shape, Technique, and Style), all of which are subclasses of crm:E55_Type. There is one property, kon:hasShape, which is a subproperty of crm:P2_has_type, intended to link a Human-Made Object (vase) [rdfs:domain] to the range [rdfs:range] of kon:Shape. Therefore, this expression is fully compatible with CIDOC-CRM's own domains and ranges while also conforming to the standard intellectual vocabulary of pottery specialists. We may implement a "Fabric" class as a subclass of crm:E37_Material in order to make technical distinctions between the clay from Corinth and Attica, for example. We will expand the scope of our ontology, and its relationship to CIDOC-CRM, as use cases arise.

Wednesday, November 17, 2021

Techniques published to Kerameikos

About two dozen techniques have been published to Kerameikos.org, researched, prepared, and vetted by the projects graduate student interns and the Archaic/Classical Greek working group of Tyler Jo Smith and Renee Gondek. Some of these are linked hierarchically, e.g., that Black-figure has both silhouette and incised as parent concepts. These techniques derived initially from the Beazley Archive Pottery Database before normalization and reconciliation with URIs in other vocabulary schemes, such as the Getty Art & Architecture Thesaurus.

Red-figure map


Friday, July 2, 2021

Nearly 4000 Getty Museum vases (and fragments) integrated into Kerameikos

Thanks to the help of David Newbury and Brenda Podemski at the Getty Museum, I have managed to integrate nearly 4,000 vases and fragments of vases from the Getty Museum into Kerameikos.org. Using the prototype Getty SPARQL endpoint, I was able to construct a query to extract all vessels created on or before 300 BC. This certainly includes more than our narrower scope of Archaic and Classical Athenian pottery (for example, more than 100 Red-figure Apulian vases), but we can revisit full reconciliation to relevant URIs at a later stage.

The data in the Getty SPARQL endpoint haven't been fully normalized to Getty vocabulary URIs, and so the classifications of production places, materials, and shapes were parsed and reconciled from textual statements in OpenRefine. This process was relatively straightfoward and only took a few hours.

I think spent some time muddling around with OpenRefine GREL conditionals for multiple artists in an export template, which I have uploaded into Gist. When an object has more than one artist that contributed to its production, the main production event consists of (crm:P9_consists_of) two parts, which were carried out by different individuals. You could assign a role to these individuals at the production level, but it can usually be extracted from the SKOS concept of the artist, where we use the W3C org ontology to assign a role of painter and/or potter. Between the Getty and BM data, we probably have enough specimens to map relationships between artists that overlap in their collaboration in producing pottery.

Getty 72.AE.148, a collaboration between Exekias and The Painter of the Vatican Mourner

For example, Douris produced more works with Onesimos than any other artist (see SPARQL query). It wouldn't take more work to build an API on this query that delivers JSON for the d3plus Network visualization library, much like what I've already done for Hellenistic monograms and Roman Republican die links for numismatic projects.

 


Of course, the Getty's images are IIIF, and linking to the IIIF manifest allows us to display and annotate multiple high-resolution images for an object.

Friday, May 28, 2021

Tampa Museum of Art joins Kerameikos + OpenRefine templates

The Tampa Museum of Art (TMA) has recently joined the Kerameikos project, supplying data and Creative Commons-licensed images for a dozen Attic vases that have been digitized so far as part of their new collections management system. These objects can be seen at the Kerameikos URI for the TMA.

Tampa Museum of Art
 

Importantly, this is the first collection normalized in OpenRefine and directly exported into the Linked Art CIDOC-CRM RDF/XML aggregation model. Previous collections from the British Museum and Fitzwilliam were reconciled to Kerameikos URIs in OpenRefine and then exported into CSV for external processing with PHP scripts. I'd like to get away from this bespoke scripting, and OpenRefine's export templates are more than adequate for generating RDF for import into the Kerameikos SPARQL endpoint.

I have added this template into Gist, and hopefully other projects can use them to do their own reconciliation and normalization, and provide RDF to us without me personally doing this work. In the longer term, we are aiming to harvest Linked Art JSON-LD directly, which I had previously prototyped in October 2019 with data from the Indianapolis Museum of Art.

First, you can see the TMA spreadsheet, post-reconciliation, here.

The forNonBlank GREL statement enables including properties or nodes only if a URI is present in the spreadsheet:

{{forNonBlank(cells["Shape URI"], c, '<kon:hasShape rdf:resource="' + c.value + '"/>', "")}}

 Where kon:hasShape is a subproperty of crm:P2_has_type, but otherwise the Kerameikos data model follows the Linked Art profile pretty precisely. Concept URIs should be Kerameikos ones. The Linked Art JSON-LD harvester normalizes Getty and others to Kerameikos, when they are linked via skos:exactMatch.

Findspot URIs should be reconciled to Wikidata places:


{{forNonBlank(cells["Findspot URI"], c, '<crmsci:O19i_was_object_found_by>
    <crmsci:S19_Encounter_Event>
        <crm:P7_took_place_at>
            <crm:E53_Place>
                <rdfs:label xml:lang="en">' + cells["Findspot"].value + '</rdfs:label>
                <crm:P89_falls_within rdf:resource="' + c.value + '"/>                        
            </crm:E53_Place>
        </crm:P7_took_place_at>
    </crmsci:S19_Encounter_Event>
</crmsci:O19i_was_object_found_by>', "")}}

Measurements are expressed by using Getty AAT URIs for the measurement type (e.g., height, width, etc.) and unit (cm, mm, etc.). Below illustrates rendering a centimeter height measurement from the spreadsheet into RDF:

{{forNonBlank(cells["Height (cm)"], c, '<crm:P43_has_dimension>
    <crm:E54_Dimension>
        <crm:P90_has_value rdf:datatype="http://www.w3.org/2001/XMLSchema#decimal">' + c.value + '</crm:P90_has_value>
        <crm:P2_has_type rdf:resource="http://vocab.getty.edu/aat/300055644"/>
        <crm:P91_has_unit rdf:resource="http://vocab.getty.edu/aat/300379098"/>
    </crm:E54_Dimension>
</crm:P43_has_dimension>', "")}}

Presently (until the TMA sorts out a server issue with their IIIF image info.json now being accessible), the TMA images are jpeg files formed by using the image API to get an 800 pixel wide response, but the model for representing IIIF services and manifests can be found at https://linked.art/model/digital/#iiif.

Wednesday, May 5, 2021

Prototype Object Viewer in Kerameikos

Over the last few days, I have put together a prototype of an object viewer within the Kerameikos.org framework that reads the vase URI from a URL parameter and executes a SPARQL query of the underlying Linked Art-compliant CIDOC CRM to gather all of the metadata necessary to create a nice, human-readable page. The construction of these page includes an API call of Kerameikos to get all of the associated SKOS concept data for any kerameikos.org URI referred to by the vase RDF. This pipeline can be extended to query data APIs from Nomisma.org, the Getty vocabularies, or other controlled vocabulary data systems.

I have taken the additional step of implementing an XSLT function that returns multilingual UI labels, even though almost none of these UI labels have been translated into other languages yet. However, the language (whether set by the Accept-Language header by the browser or manually overridden with the 'lang' request parameter) is used to display the preferred label for the Kerameikos.org SKOS concept, if it is available in the underlying RDF data. This is often, though not always, the case for concepts that have been aligned to Wikidata, and labels extracted programmatically from their API.

Collections that make their images available through IIIF manifests (represented by crm:P129i_is_subject_of), such as the Fitzwilliam Museum will have these manifests rendered by Mirador. For other collections that conform to IIIF image APIs, but do not produce manifests, such as the British Museum, the image(s) will be displayed in the Leaflet IIIF viewer. Eventually, I will generate an intermediate API that dynamically generates a manifest from underlying IIIF image URIs so that these images can be annotated with iconographic URIs in order to build a more LOD-integrated research tool for iconography. This framework will extend beyond just vases to encompass other types of material culture.


British Museum vase of Exekias, partially displayed in French.

These pages are constructed by the following URL pattern:

http://kerameikos.org/object/?uri={object URI}

A dynamic GeoJSON response that may contain the production place coordinates or polygon and/or the findspot coordinates follows the pattern:

http://kerameikos.org/object/geoJSON?uri={object URI}

Example: http://kerameikos.org/object/geoJSON?uri=https://www.britishmuseum.org/collection/object/G_1836-0224-127

A link has been added to any image popup in the various concept pages (see below).

A popup of a vase of the Achilles Painter.
 

In the long-term, I hope to be able to peel this functionality from the Kerameikos.org software architecture and turn it into a standalone system that is more generalizable for any CIDOC-CRM that conforms to the profile expressed by the Linked Art community. This system is entirely driven by SPARQL queries at the moment, but I plan to integrate Fuseki with Solr or ElasticSearch to build out a faceted search interface and various data visualization tools, from geographic distributions to networks of artists to other sorts of statistical distributions. The system will be agnostic about specific types of content (vases), and could serve as a large scale aggregation and research tool for many types of objects, a sort of new rendition of Pelagios' dormant Peripleo.

Friday, March 19, 2021

The Fitzwilliam Museum Attic vases aggregated into Kerameikos

With API access grant to the Fitzwilliam Museum collection by Dan Pett, I was able to spend a few hours yesterday evening writing a script to query relevant terracotta objects produced in Athens from the database. Loading these into OpenRefine, I eliminated object types that are not vases (e.g., figurines or architectural fragments), which resulted in a total of about 700 objects from the Fitzwilliam integrated into Kerameikos.org's SPARQL endpoint for query and visualization.

The Fitzwilliam Museum page on Kerameikos

The various concepts used in cataloging the Fitz collection were reconciled in OpenRefine to Kerameikos URIs, and like the British Museum data, more than 60 findspots were aligned with Wikidata.org URIs, making it possible to visualize the geographic distribution of relevant concepts.

The "Objects of the Typology" section of each page only shows those items with photographs,  but the geographic and distribution analyses include all relevant objects. Eventually, I will implement CSV exports for data so that they can be more easily reused in other platforms.

A quick analysis of the Fitzwilliam's collection shows Black-figure lekythoi are prominent compared to other Black and Red-figure objects.

See results here.