Saturday, February 16, 2013

DBPedia

By the end of previous posts, I got an overview of how DBPedia, RDF & SPARQL are related and learnt what RDF and SPARQL are in more detail. Since my initially set aim was to be able to query on DBPedia using SPARQL, I had to understand the schema of DBPedia and also, try out some simple examples to query DBPedia. In this post, that would be my agenda.

One important point to know is that for any wikipedia page, there mostly exists a corresponding dbpedia page and this page URI can be easily obtained as I will indicate through an example. Consider the wikipage on Civil Engineering. Its URI is http://en.wikipedia.org/wiki/Civil_engineering. Then, its corresponding dbpedia page is http://dbpedia.org/page/Civil_engineering. Thus, all one has to do to get the dbpedia page from wikipedia page is to replace http://en.wikipedia.org/wiki/ with http://dbpedia.org/page/ or http://dbpedia.org/resource/.

Now, how do I extract some information from this dbpedia page? First, lets have a look at the dbpedia page on Civil Engineering.


We can see that it has many properties whose value can be queried.

EXAMPLE:

Lets get the abstract in English from dbpedia page on Civil Engineering. As can be seen, abstract is a property defined in dbpedia-owl namespace whose URI is http://dbpedia.org/ontology/. With this information, we are ready to write SPARQL query (Taken from http://blog.3kbo.com/2008/08/11/dbpedia-examples-using-linked-data-and-sparql/ after modifying it).

SPARQL Query:

SELECT ?abstract
WHERE {
{ <http://dbpedia.org/resource/Civil_engineering> <http://dbpedia.org/ontology/abstract> ?abstract .
FILTER langMatches( lang(?abstract), 'en') }
}
Try it out at DBPedia specific SNORQL's endpoint.


Explanation:

The first sentence inside the WHERE block gets the value corresponding to property 'abstract'. Then, the FILTER command extracts only English abstract, which is finally displayed.

Result as displayed when SNORQL endpoint is used:

SPARQL results:

abstract
"Civil engineering is a professional engineering discipline that deals with the design, construction and maintenance of the physical and naturally built environment, including works such as bridges, roads, canals, dams and buildings. Civil engineering is the oldest engineering discipline after military engineering, and it was defined to distinguish non-military engineering from military engineering. It is traditionally broken into several sub-disciplines including environmental engineering, geotechnical engineering, structural engineering, transportation engineering, municipal or urban engineering, water resources engineering, materials engineering, coastal engineering, surveying, and construction engineering. Civil engineering takes place on all levels: in the public sector from municipal through to federal levels, and in the private sector from individual homeowners through to international companies."@en



TRY OUT: Write a SPARQL query to get us a list of all churches in Paris.

Jena Framework

My first goal of gaining the required knowledge to query DBPedia for some information of interest was accomplished by my last post. However, the journey is not yet complete! It is very important to be able to query such information automatically through a program. Hence, my next logical step was to get to know a framework that will allow me to do this. Jena was just perfect was this requirement.

Let me give you a quick intro on Jena. Jena (http://jena.sourceforge.net/index.html) is an open source Java framework developed by HP Labs that provides methods to be used  in Java programs to query any dataset following certain formats, RDF being one of them, through SPARQL and other similar query languages. 

To get information of interest from DBPedia and for that matter, any other database using Jena, I need to first know the methods offered by Jena. I am sure many of you will agree with me that just reading about the methods is boring. We can understand their usage faster and well through simple Java codes that uses Jena to query any database. We need not be loyal to DBPedia right away because my immediate aim is to understand methods of Jena. So, this post is all about Jena and how to create and query database through it. All the below experimentation is done on WINDOWS XP.

STARTING UP
Ofcourse, to achieve this aim, I have to first download Jena and have it setup on my system. I downloaded Jena 2.6.4 from http://jena.sourceforge.net/index.html and Jena 2.3 from http://sourceforge.net/projects/jena/files/Jena/Jena-2.3/ to get jena-2.6.4.zip and Jena-2.3.zip.

1. I extracted the contents of jena-2.6.4.zip to get a folder named 'Jena-2.6.4'.
2. I have eclipse (Version: Galileo) on my machine. I opened eclipse.

3. Went to File->New->Project. In the shown hierarchy, opened hierarchy under Java by clicking on the '+' (When expanded, shows '-') and selected Java Project. Clicked on 'Next' Button.
4. Selected 'Create project from existing source' and clicked on 'Browse' button next to 'Directory' to open 'Jena-2.6.4' folder.
5. Clicked on 'Finish' Button. Now, you should be able to see 'Jena-2.6.4' folder in your sidebar under 'Navigator' tab.
6. Expand the folder (By clicking on + sign) to view the folders inside 'Jena-2.6.4' folder.



























































EXAMPLES


Some simple example java programs are given under Jena-2.6.4->src-examples->Jena->examples->rdf named as Tutorial<no.>.java. Each of these file content is explained at An Introduction to RDF and the Jena RDF API. However, to run all these tutorials successfully, we need few .rdf files called 'vc-db-1.rdf' that isn't included in the 
 downloaded folder.


  1. vc-db-1.rdf - An explanation of the content of this file is provided at SPARQL Tutorial - Data Formats.
  2. vc-db-3.rdf - An explanation of the content of this file is provided at A Programmer's Introduction to RDQL.

 Save the downloaded file in 'Jena-2.6.4' folder. After this, I could run Tutorial05.java, Tutorial07.java and Tutorial08.java without any issues. But, Tutorial06.java gives an error 'WARN [main] (RDFDefaultErrorHandler.java:36) ...' that I am unable to figure out.

Again, for Tutorial09.java, we need 'vc-db-3.rdf' and 'vc-db-4.rdf' files.