Thursday, December 1, 2016

Experimenting with IIIF, CIDOC-CRM, and the Europeana Data Model

One of our major tasks in the future is to facilitate sophisticated analysis of Greek pottery aggregated by means of coreferencing between Kerameikos.org concepts and other vocabulary systems. The proof of concept that we demonstrated at CAA in Paris included dozens of Greek vases from the Getty and British Museum. One such immediate potential data partner is the Harvard Art Museums.

The Harvard Art Museums have adopted an open approach to their collection and have implemented a powerful and well-documented API. This API has allowed us to integrate thousands of Greek and Roman coins from their collection into Nomisma.org to be made available through several type corpus projects (like Online Coins of the Roman Empire). Harvard contains a nice collection of Greek pottery and a means of harvesting these materials programmatically. Furthermore, they are an adopter of the International Image Interoperability Framework (IIIF), which would enable zooming of large-scale images or dynamic extraction of portions of images, as well as facilitate the annotation of these images with related iconographic or decorative subject matter or inscriptions. Since we are using CIDOC-CRM to describe the vases, the question is: how can we extend our model to include metadata that will enable the integration of IIIF functionality directly in Kerameikos.org?

Fortunately, the hard work has already been done for us. Europeana has already published specifications for linking to IIIF services and metadata manifests within the Europeana Data Model, and there are a number of useful examples, such as those provided by John Howard at the University College Dublin Library.

While we may migrate from FOAF TO EDM properties for linking to large images or thumbnails (edm:preview and others), we do not need to modify the current system of foaf:thumbnail and foaf:depiction in order to accommodate IIIF integration.

We can do this by adding some more triples about the URL of the foaf:depiction.  E.g.,

?vase a crm:E22_Man-Made_Object ;
    foaf:depiction <http://nrs.harvard.edu/urn-3:HUAM:DDC251369_dynmc>.


<http://nrs.harvard.edu/urn-3:HUAM:DDC251369_dynmc> a edm:WebResource ;
     svcs:has_service <https://ids.lib.harvard.edu/ids/iiif/46594017>
    dcterms:isReferencedBy <http://iiif.harvardartmuseums.org/manifests/object/288118>.

<https://ids.lib.harvard.edu/ids/iiif/46594017> a svcs:Service ;
    dcterms:conformsTo <http://iiif.io/api/image> ;
    doap:implements <http://iiif.io/api/image/2/level2.json>.

 In our SPARQL query for aggregating objects, we can optionally extract the dcterms:isReferencedBy for the foaf:depiction of our Greek vase. There's an XSLT conditional for parsing the SPARQL response so that our fancybox JQuery plugin will either show a popup of an image file or a popup window of the Leaflet IIIF plugin.

As a simple proof of concept, I have extracted two vases (RDF here) from Harvard of the Berlin Painter and successfully implemented the RDF model and modified the accompanying SPARQL queries and code accordingly to show zoomable images of these vases.

Thursday, November 10, 2016

Distribution visualization with SPARQL and d3js

After more than a year of dormancy, I have picked up Kerameikos.org development again in preparation of a collaboration with the Beazley Archive of the University of Oxford and, hopefully, a grant application. We hope to publish the entire array of identifiers necessary for Archaic and Classical Greek pottery and develop more advanced analysis and visualization systems built upon open vase data we can acquire from a variety of sources (e.g., the British Museum and the Harvard Art Museums).

Aside from some minor stylistic updates to the site, I implemented two major changes:

1. I rewrote the geographic visualizations to serialize the SPARQL response into geoJSON to render in Leaflet instead of the OpenLayers-based Timemap library, which has not seen active development in at least five years. I really like being able to scroll through a timeline of objects, but I will have to wait until another Leaflet plugin can do something similar.

2. I implemented SPARQL-based distribution visualization with the d3plus plugin to d3js. The code was almost entirely ported from the Nomisma.org distribution analysis features I have recently been working on.

This builds on the previously established model where request parameters are parsed within Orbeon's XML Pipeline Language and constructed into an XML object that is then transformed with XSLT into a textual SPARQL query. The difference here is that the example vases from the Getty and British Museum are represented as Linked Open Data with CIDOC-CRM, as well as defined by the typological URIs in their own vocabulary systems (AAT/ULAN/TGN and the British Museum's own internal LOD thesaurus, respectively). As a result, the XML model that represents the query is significantly more complex than the Nomisma visualizations, which are built on a simpler RDF model and only a single vocabulary system.

In the query below, we are getting the distribution of shapes for Red Figure pottery:

SELECT DISTINCT ?concept ?label (count(?concept) as ?count) WHERE {
  {
    SELECT ?1 WHERE { kid:red_figure skos:exactMatch ?1}
  }
    ?object crm:P32_used_general_technique ?1.
    ?object kon:hasShape ?dist  
  {
    SELECT ?dist ?label ?concept WHERE {
      ?concept skos:exactMatch ?dist;
               skos:prefLabel ?label FILTER langMatches(lang(?label), "en")}
  }
} GROUP BY ?concept ?label ORDER BY ?label
As you can see, there is a subselect where we gather all of the URIs that are SKOS exact matches for the Kerameikos URI and then get the objects created with this technique. Using a simplified semantic that better represents knowledge organization specifically within ceramics studies, we use kon:hasShape to get the shape URIs. Like techniques, these URIs may be in the AAT or BM thesaurus. We therefore have to get the matching Kerameikos URI, and extract the English label. Here is the full query. Here are the results to the SPARQL query in HTML.

With regard to the XML model that forms the SPARQL query, the XPL/XSLT stylesheet is on Github. Below is an example, where $object is the object in the triple. The $id variable is formed by position (must be unique in the query) of the piece of the query in HTTP request parameter. The parameter, in this case, is 'compare=technique kid:red_figure'. Queries can be more precise by concatenating multiple predicate-object pairs with a semicolon.

<statements>
    <select id="{$id}">
        <triple s="{$object}" p="skos:exactMatch" o="?{$id}"/>
    </select>
     <triple s="?object" p="crm:P32_used_general_technique" o="?{$id}"/>
     <triple s="?object" p="kon:hasShape" o="?dist"/>
</statements>

This XML is transformed with XSLT into SPARQL and executed in the XPL. Like in Nomisma, you can compare multiple query sets.

Distribution of shapes for Red vs. Black Figure Greek pottery (from a limited sample size)

Charts are generated via AJAX on Kerameikos ID pages but are generated by passing request parameters on the distribution page, enabling the copying and pasting of charts. Furthermore, you can download CSV that represents the datasets, which will include geographic coordinates if Production Place is the distribution category.