Friday, July 2, 2021

Nearly 4000 Getty Museum vases (and fragments) integrated into Kerameikos

Thanks to the help of David Newbury and Brenda Podemski at the Getty Museum, I have managed to integrate nearly 4,000 vases and fragments of vases from the Getty Museum into Kerameikos.org. Using the prototype Getty SPARQL endpoint, I was able to construct a query to extract all vessels created on or before 300 BC. This certainly includes more than our narrower scope of Archaic and Classical Athenian pottery (for example, more than 100 Red-figure Apulian vases), but we can revisit full reconciliation to relevant URIs at a later stage.

The data in the Getty SPARQL endpoint haven't been fully normalized to Getty vocabulary URIs, and so the classifications of production places, materials, and shapes were parsed and reconciled from textual statements in OpenRefine. This process was relatively straightfoward and only took a few hours.

I think spent some time muddling around with OpenRefine GREL conditionals for multiple artists in an export template, which I have uploaded into Gist. When an object has more than one artist that contributed to its production, the main production event consists of (crm:P9_consists_of) two parts, which were carried out by different individuals. You could assign a role to these individuals at the production level, but it can usually be extracted from the SKOS concept of the artist, where we use the W3C org ontology to assign a role of painter and/or potter. Between the Getty and BM data, we probably have enough specimens to map relationships between artists that overlap in their collaboration in producing pottery.

Getty 72.AE.148, a collaboration between Exekias and The Painter of the Vatican Mourner

For example, Douris produced more works with Onesimos than any other artist (see SPARQL query). It wouldn't take more work to build an API on this query that delivers JSON for the d3plus Network visualization library, much like what I've already done for Hellenistic monograms and Roman Republican die links for numismatic projects.

 


Of course, the Getty's images are IIIF, and linking to the IIIF manifest allows us to display and annotate multiple high-resolution images for an object.

Friday, May 28, 2021

Tampa Museum of Art joins Kerameikos + OpenRefine templates

The Tampa Museum of Art (TMA) has recently joined the Kerameikos project, supplying data and Creative Commons-licensed images for a dozen Attic vases that have been digitized so far as part of their new collections management system. These objects can be seen at the Kerameikos URI for the TMA.

Tampa Museum of Art
 

Importantly, this is the first collection normalized in OpenRefine and directly exported into the Linked Art CIDOC-CRM RDF/XML aggregation model. Previous collections from the British Museum and Fitzwilliam were reconciled to Kerameikos URIs in OpenRefine and then exported into CSV for external processing with PHP scripts. I'd like to get away from this bespoke scripting, and OpenRefine's export templates are more than adequate for generating RDF for import into the Kerameikos SPARQL endpoint.

I have added this template into Gist, and hopefully other projects can use them to do their own reconciliation and normalization, and provide RDF to us without me personally doing this work. In the longer term, we are aiming to harvest Linked Art JSON-LD directly, which I had previously prototyped in October 2019 with data from the Indianapolis Museum of Art.

First, you can see the TMA spreadsheet, post-reconciliation, here.

The forNonBlank GREL statement enables including properties or nodes only if a URI is present in the spreadsheet:

{{forNonBlank(cells["Shape URI"], c, '<kon:hasShape rdf:resource="' + c.value + '"/>', "")}}

 Where kon:hasShape is a subproperty of crm:P2_has_type, but otherwise the Kerameikos data model follows the Linked Art profile pretty precisely. Concept URIs should be Kerameikos ones. The Linked Art JSON-LD harvester normalizes Getty and others to Kerameikos, when they are linked via skos:exactMatch.

Findspot URIs should be reconciled to Wikidata places:


{{forNonBlank(cells["Findspot URI"], c, '<crmsci:O19i_was_object_found_by>
    <crmsci:S19_Encounter_Event>
        <crm:P7_took_place_at>
            <crm:E53_Place>
                <rdfs:label xml:lang="en">' + cells["Findspot"].value + '</rdfs:label>
                <crm:P89_falls_within rdf:resource="' + c.value + '"/>                        
            </crm:E53_Place>
        </crm:P7_took_place_at>
    </crmsci:S19_Encounter_Event>
</crmsci:O19i_was_object_found_by>', "")}}

Measurements are expressed by using Getty AAT URIs for the measurement type (e.g., height, width, etc.) and unit (cm, mm, etc.). Below illustrates rendering a centimeter height measurement from the spreadsheet into RDF:

{{forNonBlank(cells["Height (cm)"], c, '<crm:P43_has_dimension>
    <crm:E54_Dimension>
        <crm:P90_has_value rdf:datatype="http://www.w3.org/2001/XMLSchema#decimal">' + c.value + '</crm:P90_has_value>
        <crm:P2_has_type rdf:resource="http://vocab.getty.edu/aat/300055644"/>
        <crm:P91_has_unit rdf:resource="http://vocab.getty.edu/aat/300379098"/>
    </crm:E54_Dimension>
</crm:P43_has_dimension>', "")}}

Presently (until the TMA sorts out a server issue with their IIIF image info.json now being accessible), the TMA images are jpeg files formed by using the image API to get an 800 pixel wide response, but the model for representing IIIF services and manifests can be found at https://linked.art/model/digital/#iiif.

Wednesday, May 5, 2021

Prototype Object Viewer in Kerameikos

Over the last few days, I have put together a prototype of an object viewer within the Kerameikos.org framework that reads the vase URI from a URL parameter and executes a SPARQL query of the underlying Linked Art-compliant CIDOC CRM to gather all of the metadata necessary to create a nice, human-readable page. The construction of these page includes an API call of Kerameikos to get all of the associated SKOS concept data for any kerameikos.org URI referred to by the vase RDF. This pipeline can be extended to query data APIs from Nomisma.org, the Getty vocabularies, or other controlled vocabulary data systems.

I have taken the additional step of implementing an XSLT function that returns multilingual UI labels, even though almost none of these UI labels have been translated into other languages yet. However, the language (whether set by the Accept-Language header by the browser or manually overridden with the 'lang' request parameter) is used to display the preferred label for the Kerameikos.org SKOS concept, if it is available in the underlying RDF data. This is often, though not always, the case for concepts that have been aligned to Wikidata, and labels extracted programmatically from their API.

Collections that make their images available through IIIF manifests (represented by crm:P129i_is_subject_of), such as the Fitzwilliam Museum will have these manifests rendered by Mirador. For other collections that conform to IIIF image APIs, but do not produce manifests, such as the British Museum, the image(s) will be displayed in the Leaflet IIIF viewer. Eventually, I will generate an intermediate API that dynamically generates a manifest from underlying IIIF image URIs so that these images can be annotated with iconographic URIs in order to build a more LOD-integrated research tool for iconography. This framework will extend beyond just vases to encompass other types of material culture.


British Museum vase of Exekias, partially displayed in French.

These pages are constructed by the following URL pattern:

http://kerameikos.org/object/?uri={object URI}

A dynamic GeoJSON response that may contain the production place coordinates or polygon and/or the findspot coordinates follows the pattern:

http://kerameikos.org/object/geoJSON?uri={object URI}

Example: http://kerameikos.org/object/geoJSON?uri=https://www.britishmuseum.org/collection/object/G_1836-0224-127

A link has been added to any image popup in the various concept pages (see below).

A popup of a vase of the Achilles Painter.
 

In the long-term, I hope to be able to peel this functionality from the Kerameikos.org software architecture and turn it into a standalone system that is more generalizable for any CIDOC-CRM that conforms to the profile expressed by the Linked Art community. This system is entirely driven by SPARQL queries at the moment, but I plan to integrate Fuseki with Solr or ElasticSearch to build out a faceted search interface and various data visualization tools, from geographic distributions to networks of artists to other sorts of statistical distributions. The system will be agnostic about specific types of content (vases), and could serve as a large scale aggregation and research tool for many types of objects, a sort of new rendition of Pelagios' dormant Peripleo.

Friday, March 19, 2021

The Fitzwilliam Museum Attic vases aggregated into Kerameikos

With API access grant to the Fitzwilliam Museum collection by Dan Pett, I was able to spend a few hours yesterday evening writing a script to query relevant terracotta objects produced in Athens from the database. Loading these into OpenRefine, I eliminated object types that are not vases (e.g., figurines or architectural fragments), which resulted in a total of about 700 objects from the Fitzwilliam integrated into Kerameikos.org's SPARQL endpoint for query and visualization.

The Fitzwilliam Museum page on Kerameikos

The various concepts used in cataloging the Fitz collection were reconciled in OpenRefine to Kerameikos URIs, and like the British Museum data, more than 60 findspots were aligned with Wikidata.org URIs, making it possible to visualize the geographic distribution of relevant concepts.

The "Objects of the Typology" section of each page only shows those items with photographs,  but the geographic and distribution analyses include all relevant objects. Eventually, I will implement CSV exports for data so that they can be more easily reused in other platforms.

A quick analysis of the Fitzwilliam's collection shows Black-figure lekythoi are prominent compared to other Black and Red-figure objects.

See results here.


Wednesday, March 17, 2021

More than 4,200 British Museum Athenian vases integrated into Kerameikos.org

In a major leap forward, more than 4,200 Athenian Greek vases (primarily Archaic and Classical) have been linked to Kerameikos.org URIs and integrated into the SPARQL endpoint for query and visualization.

A large quantity of relevant painters and potters (about 300) have been published to Kerameikos.org in recent months. While work remains to create URIs and definitions for the remaining notable Classical Red-figure painters and to fill in the gaps of relatively non-notable entities from the Beazley Archive Pottery Database vocabularies, the number of artists that exist as entities within the Kerameikos LOD ecosystem is great enough to begin the process of aggregating open access museum collections.

The first choice among this is the British Museum. Using the new Collections database, more than 5,000 Athenian ceramic objects (full vases and fragments) were exported as a CSV and then normalized and reconciled to Kerameikos.org URIs through our own OpenRefine API.

As a result, 4,200 vases from the BM have been linked to Kerameikos.org shapes (the lowest barrier to entry in the system). Other concepts, such as painter/potter, production place, time period, and technique, were linked to Kerameikos.org as well (although our coverage is not yet complete in these areas). Furthermore, about three-fourths of these objects have findspots in the BM database, which were normalized to the the lowest-level geographic entity represented by Wikidata.org, representing more than 100 different places. The excavation pottery is easiest to spot, with hundreds of artifacts coming from Kameiros, Rhodes and Naucratis, Egypt (see http://kerameikos.org/id/british_museum). These reconciled findspots were queried with the Wikidata SPARQL endpoint to extract a fuller geographic hierarchy (as well as matching URIs in Pleiades, the Getty Thesaurus of Geographic Names, and Geonames.org), making it possible to query all of the objects found in the modern region of Etruria or the country of Turkey. For example, the query below will get all objects found in Italy (Q38):


SELECT ?object ?place ?placeLabel ?lat ?long WHERE {
  ?object crmsci:O19i_was_object_found_by ?encounter .
  ?encounter crm:P7_took_place_at/crm:P89_falls_within ?place .
  ?place crm:P89_falls_within+ <http://www.wikidata.org/entity/Q38> ;
         geo:location ?loc ;
         rdfs:label ?placeLabel .
  ?loc geo:lat ?lat ;
       geo:long ?long
} LIMIT 100


So based on the data we have, we can map production places and findspots associated with any sort of concept defined by Kerameikos.

Distribution of the Berlin Painter.

Now that we have significantly more data in the system (despite nearly all of it coming from a single source), geographic and distribution analysis visualizations begin to look a bit more accurate. This is not a full picture, but it is a pretty clear demonstration of the sorts of research tools that are possible on Linked Open Data methods applied to Greek pottery.

Black Figure technique distribution among BM data.

Below the example objects (which are now paginated in groups of 48), the distribution analysis chart can be generated nearly instantaneously. Here's a distribution of Black-figure shapes:

Black Figure distribution of shapes

The results of these queries can be downloaded as CSV or opened in a new page for bookmarking and citation, or refined further to compare different sets of data. As you can see, there are far more Black- than Red-figure lekythoi.


A note about the BM images

The CSV export from the British Museum includes a single column for an image. Most images published by the BM follow the IIIF image API protocol, but some of them are static jpegs on the server. I need to implement better validation between IIIF and non-IIIF images served by the BM until they are able to make available IIIF manifests.


Friday, October 25, 2019

Linked Art data harvesting and aligning to ARIADNE for archaeological context

As mentioned in the related numismatic blog post, First pass at processing Linked Art JSON-LD to Nomisma RDF, and the slides presented by the Smithsonian's Adam Soroka on my behalf at the Linked Art showcase last month at the Victoria and Albert Museum in London, Linked Art JSON-LD harvesting is now functional in the kerameikos.org back-end. Built around test data provided by Sami Norling at the Indianapolis Museum of Art at Newfields and supplemented with some additional properties and Getty URIs, JSON-LD is processed by the XForms engine in Orbeon (which powers both the Nomisma and Kerameikos frameworks). Getty vocabulary URIs are mapped to applicable Kerameikos ones, and the JSON-LD is distilled into its essential graph form as RDF/XML and posted into the Kerameikos SPARQL endpoint.

For each JSON-LD GET operation, the following three tasks are initiated:

Automatic reconciliation of URIs to Kerameikos

Distinct entities related to each vase (shapes, materials, styles, techniques, artists, production places, etc.) are aggregated into a list. A SPARQL query is executed for each one (that isn't already a Kerameikos URI) in order to get the equivalent Kerameikos URI via skos:exactMatch. These mappings are stored so that SPARQL queries do not need to be executed multiple times for the same URI.

Normalizing findspot URIs to Wikidata entities

URIs for findspots, following the proposed ARIADNE Plus data model (more details below), which can be Geonames, Pleiades, Getty Thesaurus of Geographic Names, Ordnance Survey, and Wikidata, are queried in the Kerameikos endpoint to see if they have already been normalized and harvested. If not, then a SPARQL query is sent to the Wikidata.org endpoint in order to find the related Wikidata Q entity for the gazetteer URI. The Wikidata entity URI therefore serves as the primary URI scheme for findspots, regardless of which gazetteer a dataset may use locally. The SPARQL query will also gather the skos:exactMatch URIs from the Getty TGN, Pleiades, Ordnance Survey, and Geonames, when available, and extract latitudes and longitudes.

CONSTRUCT {
  ?place a skos:Concept; 
       skos:prefLabel ?placeLabel;
           skos:exactMatch ?osgeo;
           skos:exactMatch ?tgn;
           skos:exactMatch ?geonames ;
           skos:exactMatch ?pleiades ;
           dct:coverage ?coord .
}
WHERE {
  ?place wdt:P1667 "7015539" . #TGN ID for Vulci
  OPTIONAL {?place wdt:P3120 ?osgeoid .
   BIND (uri(concat("http://data.ordnancesurvey.co.uk/id/", ?osgeoid)) as ?osgeo)}
  OPTIONAL {?place wdt:P1667 ?tgnid .
   BIND (uri(concat("http://vocab.getty.edu/tgn/", ?tgnid)) as ?tgn)}
  OPTIONAL {?place wdt:P1566 ?geonamesid .
   BIND (uri(concat("http://sws.geonames.org/", ?geonamesid, "/")) as ?geonames)}
  OPTIONAL {?place wdt:P1584 ?pleiadesid .
   BIND (uri(concat("https://pleiades.stoa.org/places/", ?pleiadesid)) as ?pleiades)}
  OPTIONAL {?place p:P625/ps:P625 ?coord}
  SERVICE wikibase:label {
 bd:serviceParam wikibase:language "en"
  }
}

Furthermore, a second SPARQL query is sent to Wikidata to get the geographic hierarchy and ingest simple RDF for these places as well. This makes it possible to query for all vases found in Lazio regardless of whether they have been linked directly to Vulci or Veii. Note: this hierarchy is based on modern administrative divisions, not historical boundaries (Vulci and Veii are historically in Etruria). It might be possible to use a combination of deposit date and place to derive a historical region once projects like the World-Historical Gazetteer become more developed with regard to both time and space.

Transforming JSON-LD to CIDOC-CRM RDF/XML

After performing pre-processing URI reconciliation tasks, each Human-Made Object in the JSON response will be processed into RDF/XML. Much of the cruft that aids developers in creating human-readable interfaces will be eliminated, such as labels for entities and other sorts of textual statements. Date-times are converted into xsd:gYear. Relevant Getty (or other) URIs are mapped to Kerameikos URIs that have been created so far. Measurements are converted to metric. In order to better conform to the way in which pottery specialists model and query information, several classifications are mapped into Kerameikos.org pottery-specific RDF properties rather than following the Linked Art CIDOC CRM profile explicitly. The Kerameikos model is nearly identical to Linked Art, however, with the exception of the use of kon:hasShape (instead of a generic crm:P2_has_type for an object type) and kon:hasStyle instead of a artistic genre of a Visual Item.

A final product (still a prototype, as the Linked Art data model is still evolving) can be seen here.

Joining Linked Art and ARIADNE

Many vases in museums that have provenance include a citation to the place/site name alone with no further context about the precise location within a site. Of course, modern excavations will have this level of detail, and the ARIADNE implementation of the CRMarchaeo extension is fully capable of exploiting this fine granularity. Our use cases are much simpler, and many coin findspots follow a relevant pattern. However, some finds databases, such as the Portable Antiquities Scheme, might include more precise latitude and longitude as well as the lowest-level parish URI from Ordnance Survey. I think the ARIADNE-based find model should work for both use cases in Kerameikos and Nomisma.

I have put forth a proposal to the Linked Art community, https://github.com/linked-art/linked.art/issues/285, which has not yet received any feedback. It includes some extensions with the CRMsci and CRMgeo ontologies. This proposal has been offered following the consultation of ARIADNE data specialist, Achille Felicetti through introduction by Holly Wright.

Things to note:

1. An HMO is sci:O19i_was_object_found_by an S19_Encounter_Event. This Encounter might involve individual agents, techniques (metal detecting, as defined in the English Heritage FISH taxonomy), and a place.

2. The place might have known geographic coordinates, but may not. This place might have additional context expressed by P2_has_type (e.g., a tomb, expressed by a Getty AAT URI). A findspot should always point to a parent place defined by a gazetteer URI. A findspot for a vase might be somewhere within Vulci, but is never Vulci directly.

3. I have decided to insert a second RDF class for the crmgeo:SP5_Geometric_Place_Expression that encapsulates the WKT coordinates associated with a E53_Place: http://www.w3.org/2003/01/geo/wgs84_pos#SpatialThing. This opens the door to splitting the WKT point into geo:lat and geo:long properties, which are much more widely used within the broader LOD ecosystem than the CRMgeo extension. This means the E53_Place has two properties pointing to the same SpatialThing node, a geo:location and a crm:P168_place_is_defined_by, meaning the model remains conformant to CIDOC CRM.

As discussed above, the JSON-LD harvesting workflow will normalize this gazetteer URI to a Wikidata Q entity, extracting skos:exactMatches, coordinates, and modern geographic hierarchy and ingest these into the Kerameikos SPARQL endpoint for query and visualization.

A getFindspots API has been implemented in Kerameikos, e.g., http://kerameikos.org/apis/getFindspots?id=stamnos, which yields GeoJSON serialized from a SPARQL query that gets all of the unique findspots for a particular concept.

Geographic distribution of stamnoi.


A stamnos from Newfields is the first object in Kerameikos.org with a findspot (Vulci).

Due to the inherent hierarchy extracted from Wikidata, it is possible to query all vases found in the country of Italy, for example:


SELECT ?object ?title WHERE {  
  ?object crmsci:O19i_was_object_found_by ?encounter ;
          crm:P1_is_identified_by ?id .
  ?id crm:P2_has_type <http://vocab.getty.edu/aat/300404670> ;
      crm:P190_has_symbolic_content ?title .
  ?encounter a crmsci:S19_Encounter_Event ;
               crm:P7_took_place_at/crm:P89_falls_within+ <http://www.wikidata.org/entity/Q38>
}

This advancement is the tip of the iceberg for what's possible once we begin to aggregate a larger corpus of materials with archaeological context.

Friday, September 27, 2019

Slides for Kerameikos.org and the Linked Art showcase at the Victoria & Albert Museum

Next week I will be heading to Oxford for the AHRC-funded face-to-face meeting for the Linked Art scientific committee. On Tuesday is a showcase workshop at the Victoria & Albert Museum that I sadly cannot attend, but I have put together a small slideshow with notes that I think Sami Norling from the Indianapolis Museum of Art at Newfields will read (since the Linked Art JSON-LD harvester is built around test examples from the IMA). The slides are as follows:

Kerameikos.org is an international project that seeks to define the intellectual concepts of ceramics studies following the principles of Linked Open Data. This phase of the project is funded by the US National Endowment for the Humanities and is focused primarily on creating URIs for Archaic and Classical Greek pottery concepts, which includes authoring definitions for shapes, artists, techniques, production places, etc. and linking them to equivalent entries in other LOD thesauri, such as the Getty and British Museum vocabularies and the Pleiades Gazetteerof Ancient Places. We are also aggregating vase data from partner collections as a proof of concept to facilitate new types of query and visualization. The emerging Linked Art community plays a significant role in this process.

 


The Indianapolis Museum of Art at Newfields has a small collection of Greek vases that have served as a test case for building a harvester that integrates Linked Art-compliant JSON-LD into Kerameikos' Linked Open Data ecosystem. This vase pictured here in the IMA, represented by a URI, is a particular shape called a stamnos. It was painted by Hermonax, an Athenian artist, in the Red-figure technique in roughly the mid-5th century B.C.


 
In a test of JSON-LD provided by Sami Norling at the IMA, some minor modifications were made to fill in any gaps in cataloging with the relevant Getty Art & Architecture Thesaurus, Union List of Artist Names, and Thesaurus of Geographic Names identifiers. These URIs have equivalencies in Kerameikos.org and other systems.




The harvesting workflow parses the Linked Art JSON-LD and distills it into the most basic network graph, represented here as RDF/XML conforming to the the underlying Linked Art profile in the CIDOC CRM ontology. The human-readable labels from the JSON, which may be useful to developers working directly with that format of data, are removed, since the preferred labels in English and other languages are already inherent to Kerameikos.org's own thesaurus data model.




After entering basic metadata about a dataset (in this case, the IMA's collection of Greek vases) and a link to the JSON-LD file on a web server (which will one day be a URL for an API response), the harvester will extract the JSON and process each human-made object into RDF/XML, replacing Getty URIs with Kerameikos ones, when applicable. After this completes, the RDF is published to the Kerameikos.org SPARQL endpoint. SPARQL is a query language for linked data, and the underlying triple database is the backbone of aggregation in this project, as well as Nomisma.org, a similar linked data project for numismatics.




After the workflow completes, the vases will immediately become available in the pages associated with concepts connected to the IMA's vases, for example kerameikos.org/id/stamnos or kerameikos.org/id/hermonax. This user interface can accommodate multiple jpeg images per vase, as well as IIIF services and several types of 3D models rendered in the 3D Hop library or the Sketchfab viewer.




By means of the relationship between the Getty Thesaurus of Geographic Names, Kerameikos.org place identifiers, and the Pleiades Gazetteer of Ancient Places, it is possible to build a transformation process that converts Linked Art RDF into a different RDF data model required by the Pelagios Network. Kerameikos.org is now a data hub for Pelagios, and currently about 200 Greek vases from 6 partners are available in the Peripleo explorer. This number will grow into the thousands in the coming months and years as the full range of British Museum, Getty, and archaeological pottery are integrated into Kerameikos.

In conclusion, as the Linked Art standard begins to proliferate throughout the museum community, harvesting will be greatly simplified by having one set of APIs and models that can be applied broadly across many museum or archaeological databases, rather than relying on intermediate processes of OpenRefine data cleaning and spreadsheet-to-RDF transformation with one-off programming scripts.