Friday, August 22, 2025

New content added to Kerameikos.org

After a four year hiatus in adding content to Kerameikos.org, the project is rebooting for a new phase. Although Kerameikos has been the subject of presentations and articles in 2023 and 2024, the last content to be entered into the vocabulary was Nick Harokopos' translations of shape definitions into Greek. Recently, shape definition translations into Hebrew were added by University of Virginia student Matan Goldstein.

Today, more than 950 Attic vases from the Ashmolean Museum at the University of Oxford were integrated into the Greek pottery Linked Open Data cloud. The new Ashmolean collections database enables CSV export, which was reconciled in OpenRefine to Kerameikos URIs for shapes, techniques, artists, etc (using the reconciliation APIs that we built for Kerameikos). I should note that only the Attic vases were exported (almost entirely Archaic and Classical), not all objects from the ancient Greek world, so much remains to be integrated from the museum. Additionally, only Black-figure painters and potters have been rigorously defined. There is a smattering of Red-figure artists in Kerameikos from our 2014 prototype (Berlin Painter, Achilles Painter, and a few others), but we have not yet minted URIs for the whole range of Attic Red-figure artists. Therefore, some Ashmolean vases don't link to artist URIs that have not yet been created.

An example of a Kylix Type A from the Ashmolean: 
AN1974.344
 

More than 260 these these vases link to approximately 60 distinct place URIs, defined by Wikidata, improving the geographic visualization of related concepts. 

Kerameikos concept pages link to a single Ashmolean example image, although there are IIIF manifests that include multiple photographs. However, it seems there is an issue with their IIIF image server delivering tiles, so a full representation of photography will have to wait until the technical issue is resolved.

Lastly, we have migrated the canonical URIs for Kerameikos concepts and ontology classes and properties (which extend CIDOC-CRM ones) to https. If you are using these URIs in your own database, we recommend replacing http with https. 

Thursday, December 9, 2021

Using Wikidata APIs to regularize findspots

Combining attributes of two different pipelines, I have made an substantial update to the RDF ingestion process in the Kerameikos.org XForms back-end. As previously discussed in the development of a Linked Art JSON-LD harvester in fall 2019, findspot gazetteer URIs that match the Getty Thesaurus of Geographic Names, the UK's Ordnance Survey, and Geonames.org are reconciled to Wikidata URIs. A SPARQL query is then issued to the Wikidata endpoint to extract the coordinates, feature type/class, and parent geographic entity, if applicable.

CONSTRUCT {
  ?place a skos:Concept; 
  		   rdfs:label ?placeLabel;
           skos:closeMatch ?osgeo;
           skos:closeMatch ?tgn;
           skos:closeMatch ?geonames ;
           skos:closeMatch ?pleiades ; 
           skos:broader ?parent ;
           dct:coverage ?coord ;
           dct:type ?type .
}
WHERE {
  ?place wdt:P1667 "7015539" . #TGN ID for Vulci .
  OPTIONAL {?place wdt:P3120 ?osgeoid .
  	BIND (uri(concat("http://data.ordnancesurvey.co.uk/id/", ?osgeoid)) as ?osgeo)}
  OPTIONAL {?place wdt:P1667 ?tgnid .
  	BIND (uri(concat("http://vocab.getty.edu/tgn/", ?tgnid)) as ?tgn)}
  OPTIONAL {?place wdt:P1566 ?geonamesid .
  	BIND (uri(concat("https://sws.geonames.org/", ?geonamesid, "/")) as ?geonames)}
  OPTIONAL {?place wdt:P1584 ?pleiadesid .
  	BIND (uri(concat("https://pleiades.stoa.org/places/", ?pleiadesid)) as ?pleiades)}
  OPTIONAL {?place p:P625/ps:P625 ?coord}
  OPTIONAL {?place wdt:P131 ?parent}
  OPTIONAL {?place wdt:P31/wdt:P279+ ?type . FILTER (?type = wd:Q486972)} #is human settlement
  OPTIONAL {?place wdt:P31 ?type FILTER (?type = wd:Q839954)} #archaeological site
  SERVICE wikibase:label {
	bd:serviceParam wikibase:language "en"
  }
}

An iterative process generates RDF for each place (crm:E53_Place) and spatial feature (dually crmgeo:SP5_Geometric_Place_Expression and geo:SpatialThing to be compatible with both CIDOC-CRM and the WGS84 ontology) and its parent region. Spatial features are only attached to a place if it is a human settlement or archaeological site (so no coordinates that represent the central point of a region or nation).

This workflow had applied only to Linked Art JSON-LD ingestion, which had been prototyped with a handful of vases from the Indianapolis Museum of Art at Newfields. Subsequently, we have ingested several other collections, where CSV or JSON exports were loaded into OpenRefine for further reconciliation and exported into the CIDOC-CRM model through OpenRefine's templating system. Prior to the implementation of the template system for the Tampa Museum of Art, I had written a PHP script to turn the British Museum's CSV export from OpenRefine (following my own cleanup) into RDF, and the script performed the Wikidata SPARQL lookups illustrated above in order to incorporate the place RDF hierarchy directly in the RDF/XML file with the BM's objects, which I uploaded into the Kerameikos.org SPARQL endpoint. I had also applied this workflow to the the Getty collection.

Now that the Wikidata reconciliation and SPARQL-based lookups have been integrated directly into the RDF ingestion system in the Kerameikos.org XForms engine, I have eliminated any need for creating bespoke PHP scripts to perform findspot hierarchy lookups for any collection that we integrate into the project.


Essentially, museums can either provide Linked Art JSON-LD for harvesting (if the JSON-LD includes the necessary Kerameikos or Getty URIs) or any spreadsheet can be cleaned up in OpenRefine (with findspots reconciled directly to Wikidata URIs) and exported directly into RDF/XML following the templating principles outlined above. The Kerameikos.org ingestion workflow will fill in any gaps in findspot coverage and geographic hierarchy without further software intervention. This is a significant advancement in the sustainability of our data integration workflow and allows us to fully standardize the data model for findspot places.

I plan to implement these updates into the Nomisma.org ingestion engine next.

Thursday, December 2, 2021

Aligning Kerameikos.org more directly with CIDOC-CRM

When the Kerameikos.org project was founded in 2013, our intent was for the LOD thesaurus system to be modeled primarily in SKOS, with instances in certain categories to be designated subject-specific RDF classes in our own ontology (e.g., kon:Shape) or classes in existing ontologies (for example, foaf:Person and foaf:Group).

Our thesaurus is still built around SKOS, but since we have aligned our vase aggregation RDF model with Linked Art (a community-built CIDOC-CRM profile serialized as JSON-LD), I have subsequently made some alterations to the classes we use for concept URIs and updated our ontology.

These changes affect the RDF concepts themselves, but also I've searched and replaced classes throughout the Kerameikos codebase as well.

  • foaf:Person has been replaced with crm:E21_Person
  • foaf:Group has been replaced with crm:E74_Group
  • kon:ProductionPlace has been replaced with crm:E53_Place and kon:ProductionPlace has been deprecated from the Kerameikos ontology.
    • Spatial expressions are dually compatible with both CIDOC-CRM and the WGS84 ontology in that the E53:Place concept includes both geo:location and crm:P168_place_is_defined_by properties linking to the same node URI, which carries both the geo:SpatialThing and crmgeo:SP5_Geometric_Place_Expression classes. These spatial features may include geo:lat and geo:long (for points) or osgeo:asGeoJSON as before, but now include the crmgeo:asWKT property with a datatype of http://www.opengis.net/ont/geosparql#wktLiteral, which should make these points and polygons compatible with endpoints that support the GeoSPARQL protocol. See the machine-readable data underlying http://kerameikos.org/id/athens, for example.

The Kerameikos.org ontology page has been significantly revised to make it more transparent than before, in line with improvements we have made to the Nomisma page in recent years. The ontology URI now supports content negotiation to request RDF/XML or Turtle as alternatives with the Accept header and relevant mime-types. We have also implemented ontology versions, so that you can compare the 2015 edition with the current 2021 revision.

The ontology has been tightened up with better definitions of our few custom ceramic-oriented RDF classes (Shape, Technique, and Style), all of which are subclasses of crm:E55_Type. There is one property, kon:hasShape, which is a subproperty of crm:P2_has_type, intended to link a Human-Made Object (vase) [rdfs:domain] to the range [rdfs:range] of kon:Shape. Therefore, this expression is fully compatible with CIDOC-CRM's own domains and ranges while also conforming to the standard intellectual vocabulary of pottery specialists. We may implement a "Fabric" class as a subclass of crm:E37_Material in order to make technical distinctions between the clay from Corinth and Attica, for example. We will expand the scope of our ontology, and its relationship to CIDOC-CRM, as use cases arise.

Wednesday, November 17, 2021

Techniques published to Kerameikos

About two dozen techniques have been published to Kerameikos.org, researched, prepared, and vetted by the projects graduate student interns and the Archaic/Classical Greek working group of Tyler Jo Smith and Renee Gondek. Some of these are linked hierarchically, e.g., that Black-figure has both silhouette and incised as parent concepts. These techniques derived initially from the Beazley Archive Pottery Database before normalization and reconciliation with URIs in other vocabulary schemes, such as the Getty Art & Architecture Thesaurus.

Red-figure map


Friday, July 2, 2021

Nearly 4000 Getty Museum vases (and fragments) integrated into Kerameikos

Thanks to the help of David Newbury and Brenda Podemski at the Getty Museum, I have managed to integrate nearly 4,000 vases and fragments of vases from the Getty Museum into Kerameikos.org. Using the prototype Getty SPARQL endpoint, I was able to construct a query to extract all vessels created on or before 300 BC. This certainly includes more than our narrower scope of Archaic and Classical Athenian pottery (for example, more than 100 Red-figure Apulian vases), but we can revisit full reconciliation to relevant URIs at a later stage.

The data in the Getty SPARQL endpoint haven't been fully normalized to Getty vocabulary URIs, and so the classifications of production places, materials, and shapes were parsed and reconciled from textual statements in OpenRefine. This process was relatively straightfoward and only took a few hours.

I think spent some time muddling around with OpenRefine GREL conditionals for multiple artists in an export template, which I have uploaded into Gist. When an object has more than one artist that contributed to its production, the main production event consists of (crm:P9_consists_of) two parts, which were carried out by different individuals. You could assign a role to these individuals at the production level, but it can usually be extracted from the SKOS concept of the artist, where we use the W3C org ontology to assign a role of painter and/or potter. Between the Getty and BM data, we probably have enough specimens to map relationships between artists that overlap in their collaboration in producing pottery.

Getty 72.AE.148, a collaboration between Exekias and The Painter of the Vatican Mourner

For example, Douris produced more works with Onesimos than any other artist (see SPARQL query). It wouldn't take more work to build an API on this query that delivers JSON for the d3plus Network visualization library, much like what I've already done for Hellenistic monograms and Roman Republican die links for numismatic projects.

 


Of course, the Getty's images are IIIF, and linking to the IIIF manifest allows us to display and annotate multiple high-resolution images for an object.

Friday, May 28, 2021

Tampa Museum of Art joins Kerameikos + OpenRefine templates

The Tampa Museum of Art (TMA) has recently joined the Kerameikos project, supplying data and Creative Commons-licensed images for a dozen Attic vases that have been digitized so far as part of their new collections management system. These objects can be seen at the Kerameikos URI for the TMA.

Tampa Museum of Art
 

Importantly, this is the first collection normalized in OpenRefine and directly exported into the Linked Art CIDOC-CRM RDF/XML aggregation model. Previous collections from the British Museum and Fitzwilliam were reconciled to Kerameikos URIs in OpenRefine and then exported into CSV for external processing with PHP scripts. I'd like to get away from this bespoke scripting, and OpenRefine's export templates are more than adequate for generating RDF for import into the Kerameikos SPARQL endpoint.

I have added this template into Gist, and hopefully other projects can use them to do their own reconciliation and normalization, and provide RDF to us without me personally doing this work. In the longer term, we are aiming to harvest Linked Art JSON-LD directly, which I had previously prototyped in October 2019 with data from the Indianapolis Museum of Art.

First, you can see the TMA spreadsheet, post-reconciliation, here.

The forNonBlank GREL statement enables including properties or nodes only if a URI is present in the spreadsheet:

{{forNonBlank(cells["Shape URI"], c, '<kon:hasShape rdf:resource="' + c.value + '"/>', "")}}

 Where kon:hasShape is a subproperty of crm:P2_has_type, but otherwise the Kerameikos data model follows the Linked Art profile pretty precisely. Concept URIs should be Kerameikos ones. The Linked Art JSON-LD harvester normalizes Getty and others to Kerameikos, when they are linked via skos:exactMatch.

Findspot URIs should be reconciled to Wikidata places:


{{forNonBlank(cells["Findspot URI"], c, '<crmsci:O19i_was_object_found_by>
    <crmsci:S19_Encounter_Event>
        <crm:P7_took_place_at>
            <crm:E53_Place>
                <rdfs:label xml:lang="en">' + cells["Findspot"].value + '</rdfs:label>
                <crm:P89_falls_within rdf:resource="' + c.value + '"/>                        
            </crm:E53_Place>
        </crm:P7_took_place_at>
    </crmsci:S19_Encounter_Event>
</crmsci:O19i_was_object_found_by>', "")}}

Measurements are expressed by using Getty AAT URIs for the measurement type (e.g., height, width, etc.) and unit (cm, mm, etc.). Below illustrates rendering a centimeter height measurement from the spreadsheet into RDF:

{{forNonBlank(cells["Height (cm)"], c, '<crm:P43_has_dimension>
    <crm:E54_Dimension>
        <crm:P90_has_value rdf:datatype="http://www.w3.org/2001/XMLSchema#decimal">' + c.value + '</crm:P90_has_value>
        <crm:P2_has_type rdf:resource="http://vocab.getty.edu/aat/300055644"/>
        <crm:P91_has_unit rdf:resource="http://vocab.getty.edu/aat/300379098"/>
    </crm:E54_Dimension>
</crm:P43_has_dimension>', "")}}

Presently (until the TMA sorts out a server issue with their IIIF image info.json now being accessible), the TMA images are jpeg files formed by using the image API to get an 800 pixel wide response, but the model for representing IIIF services and manifests can be found at https://linked.art/model/digital/#iiif.

Wednesday, May 5, 2021

Prototype Object Viewer in Kerameikos

Over the last few days, I have put together a prototype of an object viewer within the Kerameikos.org framework that reads the vase URI from a URL parameter and executes a SPARQL query of the underlying Linked Art-compliant CIDOC CRM to gather all of the metadata necessary to create a nice, human-readable page. The construction of these page includes an API call of Kerameikos to get all of the associated SKOS concept data for any kerameikos.org URI referred to by the vase RDF. This pipeline can be extended to query data APIs from Nomisma.org, the Getty vocabularies, or other controlled vocabulary data systems.

I have taken the additional step of implementing an XSLT function that returns multilingual UI labels, even though almost none of these UI labels have been translated into other languages yet. However, the language (whether set by the Accept-Language header by the browser or manually overridden with the 'lang' request parameter) is used to display the preferred label for the Kerameikos.org SKOS concept, if it is available in the underlying RDF data. This is often, though not always, the case for concepts that have been aligned to Wikidata, and labels extracted programmatically from their API.

Collections that make their images available through IIIF manifests (represented by crm:P129i_is_subject_of), such as the Fitzwilliam Museum will have these manifests rendered by Mirador. For other collections that conform to IIIF image APIs, but do not produce manifests, such as the British Museum, the image(s) will be displayed in the Leaflet IIIF viewer. Eventually, I will generate an intermediate API that dynamically generates a manifest from underlying IIIF image URIs so that these images can be annotated with iconographic URIs in order to build a more LOD-integrated research tool for iconography. This framework will extend beyond just vases to encompass other types of material culture.


British Museum vase of Exekias, partially displayed in French.

These pages are constructed by the following URL pattern:

http://kerameikos.org/object/?uri={object URI}

A dynamic GeoJSON response that may contain the production place coordinates or polygon and/or the findspot coordinates follows the pattern:

http://kerameikos.org/object/geoJSON?uri={object URI}

Example: http://kerameikos.org/object/geoJSON?uri=https://www.britishmuseum.org/collection/object/G_1836-0224-127

A link has been added to any image popup in the various concept pages (see below).

A popup of a vase of the Achilles Painter.
 

In the long-term, I hope to be able to peel this functionality from the Kerameikos.org software architecture and turn it into a standalone system that is more generalizable for any CIDOC-CRM that conforms to the profile expressed by the Linked Art community. This system is entirely driven by SPARQL queries at the moment, but I plan to integrate Fuseki with Solr or ElasticSearch to build out a faceted search interface and various data visualization tools, from geographic distributions to networks of artists to other sorts of statistical distributions. The system will be agnostic about specific types of content (vases), and could serve as a large scale aggregation and research tool for many types of objects, a sort of new rendition of Pelagios' dormant Peripleo.