Tuesday, December 7, 2010

RDF, SPARQL and DBPedia Basics - My journey

I wanted to learn how to access dbpedia and extract required information. With this aim set, I had to carve my own path to achieve it. In this blog, I am narrating this journey.

First, let me tell you what is DBpedia...DBpedia (http://dbpedia.org/About) contains most of the information of wikipedia, however in structured format. It is a database. Like any database, it has an associated schema - the model in which data is stored and can be accessed. It follows RDF (Resource Description Framework) graphs schema. Like for any database, we need a format to query/get the information interested in from the database. DBpedia can be accessed via SPARQL (SPARQL Protocol and RDF Query Language) queries to obtain information of interest.

So, basically, in order to get the required information, which I am looking for, from dbpedia, I need to understand how to construct SPARQL queries and for this, I need to understand RDF schema. The questions before me were how do I start and where do I start? My first attempt was to try and get a tutorial on constructing SPARQL queries.

I found the tutorial at http://www.cambridgesemantics.com/2008/09/sparql-by-example/ very useful, though some of the results didnt match with the expected results given. Maybe this is because the associated databases have got updated since when the presentation was prepared. At this point, I would like you to note that SPARQL queries can be used for querying any database that can understand it - Its not limited to DBpedia. Further, SPARQL queries can be constructed for different schema - RDF, FOAF etc.

Below, are the key takeaways of the tutorial:

There are many available generic database query endpoints (interface to pose query to the database and get the result) that support SPARQL queries
These generic database endpoints can query any database that can understand SPARQL queries. Hence, we need to specify the database explicitly to them. The database is usually specified as URI(s). Eg: http://www.w3.org/People/Berners-Lee/card, http://www.dajobe.org/foaf.rdf etc

At the same time, there are many database query specific endpoints. For them, the database is fixed and we just can access that database through queries. Some examples of such endpoints are given below:
At these endpoints, we only need to specify the query to get the info of interest from already fixed database.

SPARQL has many ways to query the underlying database. The tutorial exposes us to some of the common constructs and keywords used, through example and explanation of the example. This gives us the necessary sneak peek into the SPARQL, before we dive in deep.

In the search process, I also came across an online UI interface called iSparql (http://demo.openlinksw.com/isparql/) to form SPARQL queries as graph and then, run them on any given database specified at Data Source URL . You can find tutorials on how to use it , SPARQL documentation etc by clicking on "Help" tab at the top. The direct link to the tutorial: http://wikis.openlinksw.com/dataspace/owiki/wiki/OATWikiWeb/InteractiveSparqlQueryBuilderTutorials
I found these tutorials to be very easy to follow overall.

Next:   RDF & SPARQL