Friday, March 19, 2021

The Fitzwilliam Museum Attic vases aggregated into Kerameikos

With API access grant to the Fitzwilliam Museum collection by Dan Pett, I was able to spend a few hours yesterday evening writing a script to query relevant terracotta objects produced in Athens from the database. Loading these into OpenRefine, I eliminated object types that are not vases (e.g., figurines or architectural fragments), which resulted in a total of about 700 objects from the Fitzwilliam integrated into Kerameikos.org's SPARQL endpoint for query and visualization.

The Fitzwilliam Museum page on Kerameikos

The various concepts used in cataloging the Fitz collection were reconciled in OpenRefine to Kerameikos URIs, and like the British Museum data, more than 60 findspots were aligned with Wikidata.org URIs, making it possible to visualize the geographic distribution of relevant concepts.

The "Objects of the Typology" section of each page only shows those items with photographs,  but the geographic and distribution analyses include all relevant objects. Eventually, I will implement CSV exports for data so that they can be more easily reused in other platforms.

A quick analysis of the Fitzwilliam's collection shows Black-figure lekythoi are prominent compared to other Black and Red-figure objects.

See results here.


Wednesday, March 17, 2021

More than 4,200 British Museum Athenian vases integrated into Kerameikos.org

In a major leap forward, more than 4,200 Athenian Greek vases (primarily Archaic and Classical) have been linked to Kerameikos.org URIs and integrated into the SPARQL endpoint for query and visualization.

A large quantity of relevant painters and potters (about 300) have been published to Kerameikos.org in recent months. While work remains to create URIs and definitions for the remaining notable Classical Red-figure painters and to fill in the gaps of relatively non-notable entities from the Beazley Archive Pottery Database vocabularies, the number of artists that exist as entities within the Kerameikos LOD ecosystem is great enough to begin the process of aggregating open access museum collections.

The first choice among this is the British Museum. Using the new Collections database, more than 5,000 Athenian ceramic objects (full vases and fragments) were exported as a CSV and then normalized and reconciled to Kerameikos.org URIs through our own OpenRefine API.

As a result, 4,200 vases from the BM have been linked to Kerameikos.org shapes (the lowest barrier to entry in the system). Other concepts, such as painter/potter, production place, time period, and technique, were linked to Kerameikos.org as well (although our coverage is not yet complete in these areas). Furthermore, about three-fourths of these objects have findspots in the BM database, which were normalized to the the lowest-level geographic entity represented by Wikidata.org, representing more than 100 different places. The excavation pottery is easiest to spot, with hundreds of artifacts coming from Kameiros, Rhodes and Naucratis, Egypt (see http://kerameikos.org/id/british_museum). These reconciled findspots were queried with the Wikidata SPARQL endpoint to extract a fuller geographic hierarchy (as well as matching URIs in Pleiades, the Getty Thesaurus of Geographic Names, and Geonames.org), making it possible to query all of the objects found in the modern region of Etruria or the country of Turkey. For example, the query below will get all objects found in Italy (Q38):


SELECT ?object ?place ?placeLabel ?lat ?long WHERE {
  ?object crmsci:O19i_was_object_found_by ?encounter .
  ?encounter crm:P7_took_place_at/crm:P89_falls_within ?place .
  ?place crm:P89_falls_within+ <http://www.wikidata.org/entity/Q38> ;
         geo:location ?loc ;
         rdfs:label ?placeLabel .
  ?loc geo:lat ?lat ;
       geo:long ?long
} LIMIT 100


So based on the data we have, we can map production places and findspots associated with any sort of concept defined by Kerameikos.

Distribution of the Berlin Painter.

Now that we have significantly more data in the system (despite nearly all of it coming from a single source), geographic and distribution analysis visualizations begin to look a bit more accurate. This is not a full picture, but it is a pretty clear demonstration of the sorts of research tools that are possible on Linked Open Data methods applied to Greek pottery.

Black Figure technique distribution among BM data.

Below the example objects (which are now paginated in groups of 48), the distribution analysis chart can be generated nearly instantaneously. Here's a distribution of Black-figure shapes:

Black Figure distribution of shapes

The results of these queries can be downloaded as CSV or opened in a new page for bookmarking and citation, or refined further to compare different sets of data. As you can see, there are far more Black- than Red-figure lekythoi.


A note about the BM images

The CSV export from the British Museum includes a single column for an image. Most images published by the BM follow the IIIF image API protocol, but some of them are static jpegs on the server. I need to implement better validation between IIIF and non-IIIF images served by the BM until they are able to make available IIIF manifests.


Friday, October 25, 2019

Linked Art data harvesting and aligning to ARIADNE for archaeological context

As mentioned in the related numismatic blog post, First pass at processing Linked Art JSON-LD to Nomisma RDF, and the slides presented by the Smithsonian's Adam Soroka on my behalf at the Linked Art showcase last month at the Victoria and Albert Museum in London, Linked Art JSON-LD harvesting is now functional in the kerameikos.org back-end. Built around test data provided by Sami Norling at the Indianapolis Museum of Art at Newfields and supplemented with some additional properties and Getty URIs, JSON-LD is processed by the XForms engine in Orbeon (which powers both the Nomisma and Kerameikos frameworks). Getty vocabulary URIs are mapped to applicable Kerameikos ones, and the JSON-LD is distilled into its essential graph form as RDF/XML and posted into the Kerameikos SPARQL endpoint.

For each JSON-LD GET operation, the following three tasks are initiated:

Automatic reconciliation of URIs to Kerameikos

Distinct entities related to each vase (shapes, materials, styles, techniques, artists, production places, etc.) are aggregated into a list. A SPARQL query is executed for each one (that isn't already a Kerameikos URI) in order to get the equivalent Kerameikos URI via skos:exactMatch. These mappings are stored so that SPARQL queries do not need to be executed multiple times for the same URI.

Normalizing findspot URIs to Wikidata entities

URIs for findspots, following the proposed ARIADNE Plus data model (more details below), which can be Geonames, Pleiades, Getty Thesaurus of Geographic Names, Ordnance Survey, and Wikidata, are queried in the Kerameikos endpoint to see if they have already been normalized and harvested. If not, then a SPARQL query is sent to the Wikidata.org endpoint in order to find the related Wikidata Q entity for the gazetteer URI. The Wikidata entity URI therefore serves as the primary URI scheme for findspots, regardless of which gazetteer a dataset may use locally. The SPARQL query will also gather the skos:exactMatch URIs from the Getty TGN, Pleiades, Ordnance Survey, and Geonames, when available, and extract latitudes and longitudes.

CONSTRUCT {
  ?place a skos:Concept; 
       skos:prefLabel ?placeLabel;
           skos:exactMatch ?osgeo;
           skos:exactMatch ?tgn;
           skos:exactMatch ?geonames ;
           skos:exactMatch ?pleiades ;
           dct:coverage ?coord .
}
WHERE {
  ?place wdt:P1667 "7015539" . #TGN ID for Vulci
  OPTIONAL {?place wdt:P3120 ?osgeoid .
   BIND (uri(concat("http://data.ordnancesurvey.co.uk/id/", ?osgeoid)) as ?osgeo)}
  OPTIONAL {?place wdt:P1667 ?tgnid .
   BIND (uri(concat("http://vocab.getty.edu/tgn/", ?tgnid)) as ?tgn)}
  OPTIONAL {?place wdt:P1566 ?geonamesid .
   BIND (uri(concat("http://sws.geonames.org/", ?geonamesid, "/")) as ?geonames)}
  OPTIONAL {?place wdt:P1584 ?pleiadesid .
   BIND (uri(concat("https://pleiades.stoa.org/places/", ?pleiadesid)) as ?pleiades)}
  OPTIONAL {?place p:P625/ps:P625 ?coord}
  SERVICE wikibase:label {
 bd:serviceParam wikibase:language "en"
  }
}

Furthermore, a second SPARQL query is sent to Wikidata to get the geographic hierarchy and ingest simple RDF for these places as well. This makes it possible to query for all vases found in Lazio regardless of whether they have been linked directly to Vulci or Veii. Note: this hierarchy is based on modern administrative divisions, not historical boundaries (Vulci and Veii are historically in Etruria). It might be possible to use a combination of deposit date and place to derive a historical region once projects like the World-Historical Gazetteer become more developed with regard to both time and space.

Transforming JSON-LD to CIDOC-CRM RDF/XML

After performing pre-processing URI reconciliation tasks, each Human-Made Object in the JSON response will be processed into RDF/XML. Much of the cruft that aids developers in creating human-readable interfaces will be eliminated, such as labels for entities and other sorts of textual statements. Date-times are converted into xsd:gYear. Relevant Getty (or other) URIs are mapped to Kerameikos URIs that have been created so far. Measurements are converted to metric. In order to better conform to the way in which pottery specialists model and query information, several classifications are mapped into Kerameikos.org pottery-specific RDF properties rather than following the Linked Art CIDOC CRM profile explicitly. The Kerameikos model is nearly identical to Linked Art, however, with the exception of the use of kon:hasShape (instead of a generic crm:P2_has_type for an object type) and kon:hasStyle instead of a artistic genre of a Visual Item.

A final product (still a prototype, as the Linked Art data model is still evolving) can be seen here.

Joining Linked Art and ARIADNE

Many vases in museums that have provenance include a citation to the place/site name alone with no further context about the precise location within a site. Of course, modern excavations will have this level of detail, and the ARIADNE implementation of the CRMarchaeo extension is fully capable of exploiting this fine granularity. Our use cases are much simpler, and many coin findspots follow a relevant pattern. However, some finds databases, such as the Portable Antiquities Scheme, might include more precise latitude and longitude as well as the lowest-level parish URI from Ordnance Survey. I think the ARIADNE-based find model should work for both use cases in Kerameikos and Nomisma.

I have put forth a proposal to the Linked Art community, https://github.com/linked-art/linked.art/issues/285, which has not yet received any feedback. It includes some extensions with the CRMsci and CRMgeo ontologies. This proposal has been offered following the consultation of ARIADNE data specialist, Achille Felicetti through introduction by Holly Wright.

Things to note:

1. An HMO is sci:O19i_was_object_found_by an S19_Encounter_Event. This Encounter might involve individual agents, techniques (metal detecting, as defined in the English Heritage FISH taxonomy), and a place.

2. The place might have known geographic coordinates, but may not. This place might have additional context expressed by P2_has_type (e.g., a tomb, expressed by a Getty AAT URI). A findspot should always point to a parent place defined by a gazetteer URI. A findspot for a vase might be somewhere within Vulci, but is never Vulci directly.

3. I have decided to insert a second RDF class for the crmgeo:SP5_Geometric_Place_Expression that encapsulates the WKT coordinates associated with a E53_Place: http://www.w3.org/2003/01/geo/wgs84_pos#SpatialThing. This opens the door to splitting the WKT point into geo:lat and geo:long properties, which are much more widely used within the broader LOD ecosystem than the CRMgeo extension. This means the E53_Place has two properties pointing to the same SpatialThing node, a geo:location and a crm:P168_place_is_defined_by, meaning the model remains conformant to CIDOC CRM.

As discussed above, the JSON-LD harvesting workflow will normalize this gazetteer URI to a Wikidata Q entity, extracting skos:exactMatches, coordinates, and modern geographic hierarchy and ingest these into the Kerameikos SPARQL endpoint for query and visualization.

A getFindspots API has been implemented in Kerameikos, e.g., http://kerameikos.org/apis/getFindspots?id=stamnos, which yields GeoJSON serialized from a SPARQL query that gets all of the unique findspots for a particular concept.

Geographic distribution of stamnoi.


A stamnos from Newfields is the first object in Kerameikos.org with a findspot (Vulci).

Due to the inherent hierarchy extracted from Wikidata, it is possible to query all vases found in the country of Italy, for example:


SELECT ?object ?title WHERE {  
  ?object crmsci:O19i_was_object_found_by ?encounter ;
          crm:P1_is_identified_by ?id .
  ?id crm:P2_has_type <http://vocab.getty.edu/aat/300404670> ;
      crm:P190_has_symbolic_content ?title .
  ?encounter a crmsci:S19_Encounter_Event ;
               crm:P7_took_place_at/crm:P89_falls_within+ <http://www.wikidata.org/entity/Q38>
}

This advancement is the tip of the iceberg for what's possible once we begin to aggregate a larger corpus of materials with archaeological context.

Friday, September 27, 2019

Slides for Kerameikos.org and the Linked Art showcase at the Victoria & Albert Museum

Next week I will be heading to Oxford for the AHRC-funded face-to-face meeting for the Linked Art scientific committee. On Tuesday is a showcase workshop at the Victoria & Albert Museum that I sadly cannot attend, but I have put together a small slideshow with notes that I think Sami Norling from the Indianapolis Museum of Art at Newfields will read (since the Linked Art JSON-LD harvester is built around test examples from the IMA). The slides are as follows:

Kerameikos.org is an international project that seeks to define the intellectual concepts of ceramics studies following the principles of Linked Open Data. This phase of the project is funded by the US National Endowment for the Humanities and is focused primarily on creating URIs for Archaic and Classical Greek pottery concepts, which includes authoring definitions for shapes, artists, techniques, production places, etc. and linking them to equivalent entries in other LOD thesauri, such as the Getty and British Museum vocabularies and the Pleiades Gazetteerof Ancient Places. We are also aggregating vase data from partner collections as a proof of concept to facilitate new types of query and visualization. The emerging Linked Art community plays a significant role in this process.

 


The Indianapolis Museum of Art at Newfields has a small collection of Greek vases that have served as a test case for building a harvester that integrates Linked Art-compliant JSON-LD into Kerameikos' Linked Open Data ecosystem. This vase pictured here in the IMA, represented by a URI, is a particular shape called a stamnos. It was painted by Hermonax, an Athenian artist, in the Red-figure technique in roughly the mid-5th century B.C.


 
In a test of JSON-LD provided by Sami Norling at the IMA, some minor modifications were made to fill in any gaps in cataloging with the relevant Getty Art & Architecture Thesaurus, Union List of Artist Names, and Thesaurus of Geographic Names identifiers. These URIs have equivalencies in Kerameikos.org and other systems.




The harvesting workflow parses the Linked Art JSON-LD and distills it into the most basic network graph, represented here as RDF/XML conforming to the the underlying Linked Art profile in the CIDOC CRM ontology. The human-readable labels from the JSON, which may be useful to developers working directly with that format of data, are removed, since the preferred labels in English and other languages are already inherent to Kerameikos.org's own thesaurus data model.




After entering basic metadata about a dataset (in this case, the IMA's collection of Greek vases) and a link to the JSON-LD file on a web server (which will one day be a URL for an API response), the harvester will extract the JSON and process each human-made object into RDF/XML, replacing Getty URIs with Kerameikos ones, when applicable. After this completes, the RDF is published to the Kerameikos.org SPARQL endpoint. SPARQL is a query language for linked data, and the underlying triple database is the backbone of aggregation in this project, as well as Nomisma.org, a similar linked data project for numismatics.




After the workflow completes, the vases will immediately become available in the pages associated with concepts connected to the IMA's vases, for example kerameikos.org/id/stamnos or kerameikos.org/id/hermonax. This user interface can accommodate multiple jpeg images per vase, as well as IIIF services and several types of 3D models rendered in the 3D Hop library or the Sketchfab viewer.




By means of the relationship between the Getty Thesaurus of Geographic Names, Kerameikos.org place identifiers, and the Pleiades Gazetteer of Ancient Places, it is possible to build a transformation process that converts Linked Art RDF into a different RDF data model required by the Pelagios Network. Kerameikos.org is now a data hub for Pelagios, and currently about 200 Greek vases from 6 partners are available in the Peripleo explorer. This number will grow into the thousands in the coming months and years as the full range of British Museum, Getty, and archaeological pottery are integrated into Kerameikos.

In conclusion, as the Linked Art standard begins to proliferate throughout the museum community, harvesting will be greatly simplified by having one set of APIs and models that can be applied broadly across many museum or archaeological databases, rather than relying on intermediate processes of OpenRefine data cleaning and spreadsheet-to-RDF transformation with one-off programming scripts.

Monday, September 23, 2019

First German translations added to Kerameikos

The first German translations for Kerameikos.org-published Greek pottery shapes have been published online through Kerameikos' spreadsheet import mechanism. These translations include preferred labels, alternative labels, and definitions. They were provided by Nicole High-Steskal and Laura Rembart of the Austrian Archaeological Institute. Nicole has recently moved to the Digital Lab at the Image Science department at the Danube University Krems.

DOIs for these intellectual contributions to Kerameikos.org will be created for Nicole and Laura soon.

Friday, September 20, 2019

Aligning Kerameikos more directly with Linked Art

I have been steadily developing a prototype data harvester that will perform some minor alterations to Linked Art-compliant JSON-LD in the XForms backend in order to ingest museum data into Kerameikos.org's SPARQL endpoint. I will write more details later as I complete the prototype (it will be ready for demonstration in time for the Linked Art Face to Face meeting in Oxford in two weeks), but in the course of testing the harvest process on some example JSON-LD from the Indianapolis Museum of Art, I have transitioned the Kerameikos RDF data model to adhere more strictly to the Linked Art profile.

The original data model developed for Kerameikos was a simplified CIDOC-CRM based on examples from the British Museum and feedback from Ontotext's Vladimir Alexiev. The focus of the model was to capture properties directly linked to various categories of Kerameikos SKOS concepts (artists, production places, techniques, shapes, etc.), with only a handful of literals encoded more simply in dcterms than the the CRM approach (e.g, for title and accession number, dcterms:title and dcterms:identifier). Several properties and classes were created in a Kerameikos.org ontology in order to fill gaps in CRM modeling and/or more accurately represent the way in which pottery scholars organize knowledge within their own discipline as opposed to a more general art museum approach. These Kerameikos properties still exist within the hybrid Linked Art data model since a category like "Shape" is more easily and logically connected via a kon:hasShape property rather than creating types of types.

Paging through Indianapolis Museum of Art photos for a vase of Hermonax.

A summary of changes is a follows:
  • Title and Accession number are linked via crm:P1_is_identified_by, which have different classes and types defined by AAT URIs.
  • Static images and IIIF services are Visual Items linked via crm:P138i_has_representation, replacing foaf:depiction. More than one can be accommodated. Thumbnails (foaf:thumbnail) have been tabled until Linked Art develops a stable model for representing more than one size for the same photograph.
  • 3D model links are also crm:P138i_has_representation. The Visual Items are given relevant dcterms:formats.
  • The IIIF manifest is linked as an Information Object with the crm:P138i_has_representation property.
  • The IIIF/3D model updates have resulting in deprecation of the old Europeana Data Model specification.
  • Kerameikos implements the model for dimensions, which are converted to metric in the harvester 


These model changes have necessitated writing a simple XSLT identity transformation to generate new static RDF/XML files our test vase data as well as updates to the underlying SPARQL queries for objects related to SKOS Concepts and the Pelagios data export. Since the new model can accommodate multiple images per vase (with the dcterms:format/dcterms:conformsTo for the Visual Item being necessary to generate UI distinctions for static images vs. IIIF vs. 3D model display in Sketchfab or 3DHop), I switched the queries from SELECT to CONSTRUCT and updated the XSLT to serialize the results to HTML or RDF (for Pelagios) from an RDF/XML model instead of the SPARQL XML response.

CONSTRUCT {?object a crm:E22_Man-Made_Object ;
            dcterms:title ?title ;
            dcterms:identifier ?id ;
            dcterms:publisher ?keeper; 
            crm:P138i_has_representation ?representation ;
            crm:P129i_is_subject_of ?manifest .
          ?representation dcterms:format ?format ;
            dcterms:conformsTo ?conformsTo} WHERE {
%STATEMENTS%
?object crm:P1_is_identified_by ?id1 ;
    crm:P1_is_identified_by ?id2 .
?id1 a crm:E33_E41_Linguistic_Appellation ;
    crm:P190_has_symbolic_content ?title .
?id2 a crm:E42_Identifier ;
    crm:P190_has_symbolic_content ?id .
OPTIONAL {?object crm:P50_has_current_keeper/skos:prefLabel ?keeper .
    FILTER (langMatches(lang(?keeper), "en"))} 
OPTIONAL {?object crm:P138i_has_representation ?representation
    OPTIONAL {?representation dcterms:format ?format}
    OPTIONAL {?representation dcterms:conformsTo ?conformsTo}}
OPTIONAL {?object crm:P129i_is_subject_of ?manifest}
}


Friday, September 6, 2019

Kerameikos is now a functioning Pelagios Network hub

As per the specifications we outlined in our National Endowment for the Humanities Digital Humanities Advancement Grant application, Kerameikos.org is now a functioning data hub for the Pelagios Network. Like Nomisma.org, objects aggregated into the Kerameikos SPARQL endpoint can be outputted into the Pelagios Open Annotated-based RDF model with a SPARQL query response that is piped through XSLT into RDF/XML. The export model includes some references to IIIF services for a few vases from Harvard Art Museums (as a proof of concept).

While there are over 300 vases in the Kerameikos SPARQL endpoint at the moment, the export includes just under 200 objects that are currently connected to Pleiades URIs through skos:exactMatch with Kerameikos place URIs. In our initial prototype from 2014, a few dozen vases from the Getty Museum were encoded in Getty TGN URIs and British Museum vases were linked to the BM's internal place thesaurus. Using Kerameikos as a bridge between vocabulary systems, the SPARQL query for the Pelagios output includes all vases linked directly to a Kerameikos URI as a production place (?object crm:P108i_was_produced_by/crm:P7_took_place_at ?place) as well as vases linked to any URI that is a skos:exactMatch for a Kerameikos URI. The Pleiades URI is then extracted into the ?match variable.

Coverage of Kerameikos partners in Peripleo.

?object crm:P108i_was_produced_by/crm:P7_took_place_at ?place .
{?place skos:exactMatch ?match FILTER strStarts(str(?match), "https://pleiades")}
UNION {?place^skos:exactMatch ?kid .
  ?kid skos:inScheme kid: ;
       skos:exactMatch ?match FILTER strStarts(str(?match), "https://pleiades")}


The partners whose vases have been integrated into Peripleo include the British Museum, Getty Museum, Ure Museum at the University of Reading, Fralin Museum at the University of Virginia, Indianapolis Museum of Art at Newfields, and the Harvard Art Museums. We expect the list of contributors to grow as more museums and archaeological datasets become part of the Kerameikos Linked Open Data cloud as as we begin to expand our geographic coverage (which is extremely limited at the moment, with URIs created for only a small handful of places).