OAI2LOD Server

The OAI2LOD Server exposes any OAI-PMH compliant metadata repository according to the Linked Data guidelines. This makes things and media objects accessible via HTTP URIs and query able via the SPARQL protocol. Parts of the OAI2LOD architecture, especially the front-end, are based on the D2R Server implementation.

Further, it provides a configurable linking mechanism based on string similarity metrics. This allows the automatic linking of OAI-PMH data with other open data sets such as DBPedia or any other OAI-PMH repository exposed via the OAI2LOD Server.

The Open Archives Initiatives Protocol for Metadata Harvesting (OAI-PMH) is a web-based protocol for harvesting metadata in any format from remote metadata repositories that provide an OAI-PMH enabled server. In recent years the protocol has gained much attention in the digital libraries and archives domain and many institutions already provide such a service. Here is a list of registered data providers, among them the Library of Congress OAI Repository, the National Library of Australia, or the Austrian National Libraries Image Archive.

OAI2LOD Server Architecture

Demos

Image Archive of the Austrian National Library

Demo 1 exposes metadata of 25.000 historical images hosted by the Austrian National Libraries Image Archive. Based on the location information given by the dc:subject property in the harvested data set, OAI2LOD has linked the records with DBPedia entries of type http://dbpedia.org/class/yago/Capital108518505. The similarity has been determined using the Levensthein distance, with minimum similarity of 1.0 - hence String equality:

Library of Congress OAI Repository

Demo 2 exposes metadata of 10.000 items hosted by the Library of Congress. Based on the location information given by the dc:coverage property in the harvested data set, OAI2LOD has linked the records with DBPedia entries of type http://dbpedia.org/class/yago/Capital108518505. The SoundEx distance metric with a threshold of 0.98 has been used for determining the links, which produces less reliable links:

Feature Overview

OAI2LODServer - release 0.2

OAI2LODServer - release 0.1

Download and installation

The OAI2LOD server is available, both as source and binary, on sourceforge.net under the GNU General Public Licence (GPL) and should run on any platform that supports Java 1.5 or higher.

Binary installation

Linux / Mac OS

  1. Download the oai2lod-server-x.x.tar.gz file and untar (tar xfz oai2lod-server-x-x.tar) it in any directory.

Windows

  1. not tested yet, but it should work. There is a zip-file oai2lodserver-x.x.zip with the same contents.

Installation from source

The OAI2LOD Server sources are available at: https://mediaspaces.svn.sourceforge.net/svnroot/mediaspaces/oai2lod. Use your SVN client to checkout the sources and run ant in the base directory to compile the release files.

OAI2LOD Server Configuration

The main configuration file specifying the OAI endpoint, the number of records to be harvested etc. is written in N3 syntax. Here is an example, for an OAI2LOD server running on port 2020, exposing metadata from the Austrian National Libraries Image Archive. It is linked with DBPedia and links resources based on specified types and properties in the source and target data sources:

  	
@prefix rdf:  <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix owl:  <http://www.w3.org/2002/07/owl#> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix oai2lod: <http://www.mediaspaces.info/vocab/oai2lod-server-config.rdf#> .

<> a oai2lod:Server;
	rdfs:label "Example OAI2LOD Server";
	oai2lod:port 2020;
	oai2lod:baseURI <http://localhost:2020/>;
	oai2lod:publishes <oai1>;
	oai2lod:linkedWith <link1>;	
	.
	
<oai1> a oai2lod:OAIServer;
	oai2lod:serverURL <http://oai-bdb.onb.ac.at/Script/oai2.aspx>;
	oai2lod:metadataPrefix "oai_dc";
	oai2lod:styleSheet "xsl/oai_dc2rdf_xml.xsl";
	oai2lod:maxRecords 50;
	.
	
<link1> a oai2lod:LinkedSPARQLEndpoint;
	oai2lod:sparqlService <http://DBpedia.org/sparql>;
	oai2lod:maxResults 5000;
	oai2lod:linkingRule <lrule1>;
	.

<lrule1> a oai2lod:LinkingRule;
	oai2lod:sourceType <http://www.mediaspaces.info/vocab/oai-pmh.rdf#Item>;
	oai2lod:sourceProperty <http://purl.org/dc/elements/1.1/subject>;
	oai2lod:targetType <http://dbpedia.org/class/yago/Capital108518505>;
	oai2lod:targetProperty <http://www.w3.org/2000/01/rdf-schema#label>;
	oai2lod:linkingProperty <http://www.w3.org/2000/01/rdf-schema#seeAlso>;
	oai2lod:similarityMetrics "uk.ac.shef.wit.simmetrics.similaritymetrics.Levenshtein";
	oai2lod:minSimilarity 1.0;
	.	
	
	
  

The first part after the namespace declarations contains the server settings:

  1. the server name: rdfs:label "Example OAI2LOD Server";
  2. the server port: oai2lod:port 2020;
  3. the URL where the server can be reached: oai2lod:baseURI <http://localhost:2020/>; - could also be www.mediaspaces.info:2020 -- do not forget the trailing slash!!!
  4. a reference to an OAI-PMH definition, which represents the second part of the server settings: oai2lod:publishes <oai1>;
  5. a reference to a (remote) SPARQL endpoint which represents the third part of the server settings: oai2lod:linkedWith <link1>;

The second part defines the OAI-PMH endpoint (NOTE: v.0.2. supports only a single endpoint):

  1. the URL of the OAI-PMH server: oai2lod:serverURL <http://memory.loc.gov/cgi-bin/oai2_0>;
  2. the metadata format to be harvested, identified by its metadataPrefix: oai2lod:metadataPrefix "oai_dc";
  3. the path to the stylesheet for transforming OAI-PMH XML metadata into RDF/XML: oai2lod:styleSheet "xsl/oai_dc2rdf_xml.xsl";
  4. the maximum number of records to be harvested: oai2lod:maxRecords 50; -- The more records you harvest, the more memory is required. Currently the OAI2LOD Server is tested with max. 25.000 records, which is already enough for many OAI-endpoints.

The third part defines a SPARQL endpoint this OAI2LOD instance should be linked with (NOTE: v.0.2. supports only a single endpoint):

  1. the URL of the SPARQL Service: oai2lod:sparqlService <http://DBpedia.org/sparql>;
  2. the maximum number of results requested in a single SPARQL call - or actuall the LIMIT of a query: oai2lod:maxResults 5000;
  3. a reference to one or more linking rules: oai2lod:linkingRule <lrule1>;

Link rules tell the OAI2LOD Server the conditions for linking a resource in the OAI2LOD data set with a resource in the remote data set. For each data set one must define source/target types as well as source/target properties. The linking algorithm then compares all values X, which are objects of a certain source property in the source data set, with all values Y, which are objects of a certain target property in the target data set. If they are similar, a link using a given property is created between the resources. For each linking rule, the user can define a minimum similarity threshold (between 0 and 1) and the similarity algorithm to be used. One can choose any algorithm provided by the SimMetrics library. Here is the JavaDoc

Starting OAI2LOD Server

Go to the directory where you have installed the OAI2LOD Server (e.g. ~Applications/oai2lod-server) and start the shell-script oai2lod-server with the configuration file as argument. For example:

  	
    ./oai2lod-server onb_config.n3
  

Shut down the service using CTRL-C or kill the process...a proper shutdown script is planned.

Support and Feature Requests

For any support or feature requests but also for any other form of input, please use the mediaspaces-users@lists.sourceforge.net mailing list provided by sourceforge.

SourceForge Logo