Tuesday, January 4, 2011

RDF & SPARQL

RDF

RDF is used to describe metadata (data about data). In RDF, data is represented as {subject, predicate, object} triples. Like in English, subject is the entity about which the data is about, predicate is the property of the subject that is being described and object is the value of the property of the subject. Eg: In the sentence, "Agra is in India", "Agra" is the subject, "is in" can be considered as predicate and "India" is the object. The equivalent RDF graph can be written as follows:

Data can be stored in RDF using N3 format or XML format.

Example: (Taken from Quick Intro to RDF - http://www.rdfabout.com/quickintro.xpd)

The data graph representing relations of John can be written in N3 format as follows:
@prefix ns: <http://www.example.org/> .
ns:john    a             ns:Person .
ns:john    ns:hasMother  ns:susan .
ns:john    ns:hasFather  ns:richard .
ns:richard ns:hasBrother ns:luke .
 File Name: john.n3

In the above example, @prefix helps to specify the namespace used in the RDF. Here, the namespace  http://www.example.org/ is abbreviated as 'ns' and used in the rest of the data description. A namespace in an RDF helps to describe the data resource as intended. The namespace helps us to define the attributes and properties of different entities of the data. In the example, john, richard, susan, luke etc are attributes and hasMother, hasFather, hasBrother are properties defined in the namespace 'ns' allowing us to define relations between them.

Further, it is also required to specify how the entities of the namespace are related to each other. This is done using RDF Schema. Ontology creation language OIL helps us to extend RDF Schema to define more specific relations and properties. I have not gone much in detail to understand RDF Schema or OIL. Please verify if what I have mentioned is correct regarding them. The same data graph can be written in XML format as follows:

<?xml version="1.0"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
 xmlns:ns="http://www.example.org/#">
  <ns:Person rdf:about="http://www.example.org/#john">
    <ns:hasMother rdf:resource="http://www.example.org/#susan" />
    <ns:hasFather>
      <rdf:Description rdf:about="http://www.example.org/#richard">
        <ns:hasBrother rdf:resource="http://www.example.org/#luke" />
      </rdf:Description>
    </ns:hasFather>
  </ns:Person>
</rdf:RDF>

File Name: john.xml

In the xml version, <rdf:RDF> defines the XML document to be an RDF document. It also contains a reference to the RDF namespace. The <rdf:Description> contains elements that describe the resource. For more details, refer to w3schools material on RDF (http://www.w3schools.com/rdf/rdf_main.asp).


SPARQL

As SPARQL is used to query RDF database, SPARQL also refers data in {subject, predicate,object} triples. The general format of the SPARQL query is as given below (Taken from a presentation available online - Sorry, forgot the specific details):

# prefix  declarations for namespaces
PREFIX foo: <http://example.com/resources/>
......
# dataset definitions for specifying the database to use
FROM <...>

# result clause
SELECT ...

# query pattern
WHERE {
.....
}

# query modifiers
ORDER BY ...

PREFIX is same as @prefix of RDF in functionality.

For complete details on SPARQL query language wrto RDF, refer to SPARQL Query Language for RDF. Below are my comments on this document.
  1. This document is easy to read with the knowledge gained from the prev post: RDF, SPARQL and DBPedia Basics -  My Journey
  2. One can directly start from Section 1.2 (Document Conventions) of the document. 
  3. For the first read, I just stayed on this page and did not navigate to any other links given on the page. Still, it was followable. 
  4. The examples given in Section 2 and Section 3 can be run and tested using Twinkle Tool - The data, given for examples, is in N3 format. Hence, store it as a N3 file and provide this file to the tool for querying. But, I am somehow getting the result entries in reverse order on Twinkle compared to what is shown as expected result  in the document. I dont know how to fix it. I will come back and look at it later. Also, I need to figure out how to get the result on Twinkle in N3 format when CONSTRUCT query form is used. By default, Twinkle shows it in RDF/XML format.
  5. Section 4.1.1 seemed to go overhead for me as I am not very clear about URIs and URLs. So, just read through it without understanding much. I hope it doesnt affect my learning on SPARQL much. I will come back to this part later. However, I could understand the Relative IRIs part better through example given in Section 4.2.
  6. Section 4.1.2 - Grammar Rules just gives the definitions of different literals like integer, decimal, string etc as regular expressions. We generally know what we mean by an integer or a decimal or a string. So, I think one can skip it if its a bit hard to follow as regular expressions. If you are coming across these terms for the first time, search for their textual definitions in Google and go througth examples and starting text. Thats sufficient. Same is true of Section 4.1.3 and where ever Grammar Rules are given.
  7. Section 8 takes about RDF Dataset. I feel it is a must read section to be able to tap in information from the whole dataset. 
Some useful links:
  1. Difference between RDF and OWL
    Prev:   RDF, SPARQL and DBPedia Basics -  My Journey
    Next:   Twinkle Tool

    No comments:

    Post a Comment