6

I am having an OWL document in the form of an XML file. I want to extract elements from this document. My code works for simple XML documents, but it does not work with OWL XML documents.

I was actually looking to get this element: /rdf:RDF/owl:Ontology/rdfs:label, for which I did this:

 DocumentBuilder builder = builderfactory.newDocumentBuilder();
    Document xmlDocument = builder.parse(
            new File(XpathMain.class.getResource("person.xml").getFile()));

    XPathFactory factory = javax.xml.xpath.XPathFactory.newInstance();
    XPath xPath = factory.newXPath();
    XPathExpression xPathExpression = xPath.compile("/rdf:RDF/owl:Ontology/rdfs:label/text()");
    String nameOfTheBook = xPathExpression.evaluate(xmlDocument,XPathConstants.STRING).toString();

I also tried extracting only the rdfs:label element this way:

 XPathExpression xPathExpression = xPath.compile("//rdfs:label");        
 NodeList nodes = (NodeList) xPathExpression.evaluate(xmlDocument, XPathConstants.NODESET);

But this nodelist is empty.

Please let me know where I am going wrong. I am using Java XPath API.

Joshua Taylor
  • 84,998
  • 9
  • 154
  • 353
Shafi
  • 1,368
  • 10
  • 24
  • 4
    You've gone wrong by using XPath to work with RDF. You should be using an RDF library. It's going to be easier to work with and less error prone than this approach, both Jena and Sesame are good options. – Michael Jun 11 '13 at 15:10
  • @Michael: You are right. But, In my architecture, I am using xpath to refer to the element in the ontology. This xpath reference is stored in the db(as a value of table attribute). Each db fields could refer to different elements from different ontologies. I am then accessing the db, getting the element name, and then am using jena for getting the datamodel for this ontology reference element. What do you suggest? – Shafi Jun 13 '13 at 09:39
  • Frankly, that sounds like a bit of a Rube Goldberg machine. You can use SPARQL to access elements in an RDF model, which is a better approach than XPath queries you're getting out of a relational db. I'm not sure why you have a RDBMS involved, but I suspect your design would be better if you simply used a relational db *or* used an RDF database. – Michael Jun 13 '13 at 10:23

3 Answers3

15

Don't query RDF (or OWL) with XPath

There's already an accepted answer, but I wanted to elaborate on @Michael's comment on the question. It's a very bad idea to try to work with RDF as XML (and hence, the RDF serialization of an OWL ontology), and the reason for that is very simple: the same RDF graph can be serialized as lots of different XML documents. In the question, all that's being asked for the is rdfs:label of an owl:Ontology element, so how much could go wrong? Well, here are two serializations of the ontology.

The first is fairly human readable, and was generated by the OWL API when I saved the ontology using the Protégé ontology editor. The query in the accepted answer would work on this, I think.

<rdf:RDF xmlns="http://www.example.com/labelledOnt#"
     xml:base="http://www.example.com/labelledOnt"
     xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
     xmlns:owl="http://www.w3.org/2002/07/owl#"
     xmlns:xsd="http://www.w3.org/2001/XMLSchema#"
     xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
    <owl:Ontology rdf:about="http://www.example.com/labelledOnt">
        <rdfs:label>Here is a label on the Ontology.</rdfs:label>
    </owl:Ontology>
</rdf:RDF>

Here is the same RDF graph using fewer of the fancy features available in the RDF/XML encoding. This is the same RDF graph, and thus the same OWL ontology. However, there is no owl:Ontology XML element here, and the XPath query will fail.

<rdf:RDF
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:owl="http://www.w3.org/2002/07/owl#"
    xmlns:xsd="http://www.w3.org/2001/XMLSchema#"
    xmlns="http://www.example.com/labelledOnt#"
    xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" > 
  <rdf:Description rdf:about="http://www.example.com/labelledOnt">
    <rdf:type rdf:resource="http://www.w3.org/2002/07/owl#Ontology"/>
    <rdfs:label>Here is a label on the Ontology.</rdfs:label>
  </rdf:Description>
</rdf:RDF>

You cannot reliably query an RDF graph in RDF/XML serialization by using typical XML-processing techniques.

Query RDF with SPARQL

Well, if we cannot query reliably query RDF with XPath, what are we supposed to use? The standard query language for RDF is SPARQL. RDF is a graph-based representation, and SPARQL queries include graph patterns that can match a graph.

In this case, the pattern that we want to match in a graph consists of two triples. A triple is a 3-tuple of the form [subject,predicate,object]. Both triples have the same subject.

  • The first triple says that the subject is of type owl:Ontology. The relationship “is of type” is rdf:type, so the first triple is [?something,rdf:type,owl:Ontology].
  • The second triple says that subject (now known to be an ontology) has an rdfs:label, and that's the value that we're interested in. The corresponding triple is [?something,rdfs:label,?label].

In SPARQL, after defining the necessary prefixes, we can write the following query.

PREFIX owl: <http://www.w3.org/2002/07/owl#>                                                                                                                                                   
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>                                                                                                                                           

SELECT ?label WHERE {                                                                                                                                                                          
  ?ontology a owl:Ontology ;                                                                                                                                                                   
            rdfs:label ?label .                                                                                                                                                                
}

(Note that because rdf:type is so common, SPARQL includes a as an abbreviation for it. The notation s p1 o1; p2 o2 . is just shorthand for the two-triple pattern s p1 o1 . s p2 o2 ..)

You can run SPARQL queries against your model in Jena either programmatically, or using the command line tools. If you do it programmatically, it is fairly easy to get the results out. To confirm that this query gets the value we're interested in, we can use Jena's command line for arq to test it out.

$ arq  --data labelledOnt.owl --query getLabel.sparql
--------------------------------------
| label                              |
======================================
| "Here is a label on the Ontology." |
--------------------------------------
Community
  • 1
  • 1
Joshua Taylor
  • 84,998
  • 9
  • 154
  • 353
  • You can also read [this answer](http://stackoverflow.com/a/1732454/1281433) while replacing regex by XPath and HTML by RDF. – Joshua Taylor Jun 11 '13 at 19:50
  • You are right. But, In my architecture, I am using xpath to refer to the element in the ontology. This xpath reference is stored in the db(as a value of table attribute). Each db fields could refer to different elements from different ontologies. I am then accessing the db, getting the element name, and then am using jena for getting the datamodel for this ontology reference element. What do you suggest? – Shafi Jun 13 '13 at 09:39
  • @Shafi Can you clarify a little bit more? I haven't done much XPath work, so I don't know all the terminology. When you say that the XPath reference is stored in the database, do you mean that the XPath _query_ (something like `/rdf:RDF/owl:Ontology/rdfs:label`) is stored in the DB, or the _result_ is stored in the DB? – Joshua Taylor Jun 13 '13 at 12:14
  • @Shafi Regardless, the query that you're running in this case asks for the `rdfs:label` of something which is an `owl:Ontology`. You can write that sort of query in SPARQL as `SELECT ?label WHERE { ?o rdf:type owl:Ontology ; rdfs:label ?label }`, and then you'll get the labels of the ontologies in the model. This query would work on either of the serializations of the model shown above, because it's based on the RDF graph structure, no the XML structure. (I'll update my answer.) – Joshua Taylor Jun 13 '13 at 12:15
  • I've updated my answer to show how you can get the same data from the model using a SPARQL query. – Joshua Taylor Jun 13 '13 at 12:30
  • Thanks a lot Joshua. This was a very useful edit. I am now seriously looking to change from XPath to SPARQL.In my previous comment, I meant that the Xpath query itself(and not the result) is stored in the db. Now, going by SPARQL way, as I understood it, I need to also store `PREfIX` in my db along with `SPARQL` query. Right? What would be the best way to do that? I can use jena as a reasoner for sparql queries. – Shafi Jun 14 '13 at 14:05
  • 1
    @Shafi I'm glad to hear that you're considering switching :). It is very easy to get XML-based solutions working in a “good-enough” way, especially if the data seems to have some structural similarity, but it is very brittle; if the version of the library that produces the RDF/XML changes, it might produce equivalent RDF, but different XML. It can be a very confusing problem to try to track down. In SPARQL, the PREFIX lines are part of the query, and let you write things like `owl:Ontology`, but you could still write that in a long form, viz., `` … – Joshua Taylor Jun 14 '13 at 14:22
  • 1
    @Shafi … instead. Even so, since most of the prefixes are pretty standard, it would probably be OK to store just the `SELECT … WHERE { … }`, and add the standard prefixes when you run the query. I suppose it depends on whether you want the database contents to contain _complete_ SPARQL queries, or pieces that (are documented to) fit together into a template in a particular (and well documented) way. – Joshua Taylor Jun 14 '13 at 14:25
  • Sorry, but could you explain the equivalence between the two XML fragments? I looked in [OWL 2 Web Ontology Language Mapping to RDF Graphs](http://www.w3.org/TR/2012/REC-owl2-mapping-to-rdf-20121211/#Parsing_of_the_Ontology_Header_and_Declarations), and elsewhere, and came up empty. – Ed Staub Jul 12 '14 at 17:24
  • 1
    @EdStaub Sure, when you have a resource `x`, that will typically appear as `` where `x`'s properties are given as child elements of that element. However, because `x rdf:type y` are so common, you can abbreviate `` as ``. That doesn't actually have anything to do with OWL; it's just part of RDF/XML. See [2.13 Typed Node Elements](http://www.w3.org/TR/rdf-syntax-grammar/#section-Syntax-typed-nodes). Examples 14 and 15 show both ways. – Joshua Taylor Jul 12 '14 at 17:33
  • @EdStaub That's just one of many places where there can be variation in how an RDF graph is serialized in RDF/XML. – Joshua Taylor Jul 12 '14 at 17:34
2

as xpath does not know the namespaces you are using. try using:

"/*[local-name()='RDF']/*[local-name()='Ontology']/*[local-name()='label']/text()"

local name will ignore the namespaces and will work (for the first instance of this that it finds)

Sean F
  • 2,352
  • 3
  • 28
  • 40
  • Thanks, this works! Isnt there a cleaner solution? I need to access many such queries for accessing different parts of the ontology. – Shafi Jun 11 '13 at 05:19
  • 1
    There is a cleaner solution, for that you have to implement a Name NameSpaceContext for you. – cpz Jun 11 '13 at 05:20
  • to do that have a look at this questions answers http://stackoverflow.com/questions/6390339/how-to-query-xml-using-namespaces-in-java-with-xpath – Sean F Jun 11 '13 at 05:23
  • @SeanF: How do we access attributes using the local-name? – Shafi Jun 11 '13 at 07:34
  • same thing would be .../*[local-name()='label']/@attributename – Sean F Jun 11 '13 at 12:43
1

You would be able to use namespaces in query if you implement javax.xml.namespace.NamespaceContext for yourself. Please have a look at this answer https://stackoverflow.com/a/5466030/1443529, this explains how to get it done.

Community
  • 1
  • 1
cpz
  • 1,002
  • 2
  • 12
  • 27