3

Initially I have this file.

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
    <owl:Class />
    <owl:Class />
    <owl:ObjectProperty />
    <Situation:Situation rdf:about"http://localhost/rdf#situa0">
        <Situation:composedBy />
    </Situation:Situation>
</rdf:RDF>

My goal is to extract the node Situation and its content using xPath "RDF/Situation" ...

<Situation:Situation rdf:about"http://localhost/rdf#situa0">
    <Situation:composedBy />
</Situation:Situation>

I found a good example to work with in Java How to extract a complete XML block.

I changed names of tags to my own since I use namespaces and predefined tags.

Here's my code

 public static void main(String... args) throws Exception {
        String xml = "<rdf:RDF xmlns:rdf=\"http://www.w3.org/1999/02/22-rdf-syntax-ns#\"><owl:Class /><owl:Class /><owl:ObjectProperty /><Situation:Situation rdf:about=\"http://localhost/rdf#situa0\" ><Situation:composedBy /></Situation:Situation></rdf:RDF>";
        DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
        Document doc = dbf.newDocumentBuilder().parse(
                new InputSource(new StringReader(xml)));

        XPath xPath = XPathFactory.newInstance().newXPath();
        Node result = (Node) xPath.evaluate("RDF/Situation", doc, XPathConstants.NODE);

        System.out.println(nodeToString(result));
    }

    private static String nodeToString(Node node) throws TransformerException {
        StringWriter buf = new StringWriter();
        Transformer xform = TransformerFactory.newInstance().newTransformer();
        xform.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
        xform.transform(new DOMSource(node), new StreamResult(buf));
        return (buf.toString());
    }

My goal is 90% achieved but I have a problem, the Situation tag has an attribute about with a prefix rdf (the code works if I remove the prefix, and even if I added rdf xmlns in the root element)

<Situation:Situation rdf:about="http://localhost/rdf#situa0">

I got this error

ERROR: 'The namespace prefix' rdf 'has not been declared.' Exception in thread "main" javax.xml.transform.TransformerException: java.lang.RuntimeException: Namespace prefix 'rdf' has not been declared. com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transform at (Unknown Source) com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transform at (Unknown Source)

I added dbf.setNamespaceAware(true) as @ Ian Roberts mentioned, so I got other errors asking for owl & Situation namespaces, after adding it in the root tag, I got nothing in output and without errors. What is the problem ?? The problem was that the variable result, this time is null, so there's a problem with the xPath query..

I tried to see in another place the result of the query and it worked fine in an online xPath tester.

enter image description here

So what is the problem ??

Is there any other way to do like this job.???

thx :)

Community
  • 1
  • 1
Ali Ben Messaoud
  • 11,690
  • 8
  • 54
  • 87
  • 1
    That XML is not namespace-well-formed - youre using two prefixes you haven't declared, `owl:` and `Situation:` - so you can't use XPath to handle it. – Ian Roberts Mar 04 '14 at 11:05
  • I just need to extract the node "Situation", I removed the rdf prefix and it worked fine. – Ali Ben Messaoud Mar 04 '14 at 11:07
  • 2
    It _appears_ to work because you're parsing the XML with a non-namespace-aware parser (you have to call `setNamespaceAware(true)` on the `dbf` before calling `newDocumentBuilder` in order to make it namespace aware), but that's implementation specific. XPath is only defined over namespace well formed XML and your code will probably stop working if you happen to introduce another XPath implementation to your classpath (e.g. Saxon). – Ian Roberts Mar 04 '14 at 11:30
  • I added `dbf.setNamespaceAware(true)` as you mentioned, so I got other errors asking for owl & Situation namespaces, after adding it in the root tag, I got nothing in output and without errors. What is the problem ?? – Ali Ben Messaoud Mar 04 '14 at 12:33
  • I printed the variable result, this time is null, so there's a problem with the xPath query.. – Ali Ben Messaoud Mar 04 '14 at 12:36
  • 1
    You can do it with xpath if you configure a `NamespaceContext` but given you are starting from a DOM document it would be much simpler to just use the DOM API and do `doc.getElementsByTagNameNS("http://localhost/Situation.owl#", "Situation")`. Or better still, use a proper RDF API such as Jena as it's quite brittle to treat RDF as XML (there are many different ways to represent the same RDF graph in XML). – Ian Roberts Mar 04 '14 at 14:30
  • ok, and how to convert a NodeList to a String as the method nodeToString() that i have ? thx :) – Ali Ben Messaoud Mar 04 '14 at 14:35
  • There seems to be no string inside the node you are trying to represent as String. Put something inside the `composedBy` tag. – helderdarocha Mar 04 '14 at 14:36
  • To @helderdarocha , as usual, no result after running. :/ – Ali Ben Messaoud Mar 04 '14 at 14:42
  • Oh. I see now that its not the string contents you are printing, but a string representation of the XML. It seems that your problem is in the transformer. I posted an answer. See if it works. – helderdarocha Mar 04 '14 at 14:49
  • 3
    Please note that it's generally not a good idea to try to work with RDF using XML processing tools, since there can be many XML serializations of the same RDF graph. See [How to access OWL documents using XPath in Java?](http://stackoverflow.com/q/17036871/1281433) and [this answer](http://stackoverflow.com/a/17052385/1281433) for more details. – Joshua Taylor Mar 04 '14 at 19:32

2 Answers2

4

Is there any other way to do like this job?

Yes, there are other,more appropriate, ways to do this job.

It's typically not a great idea to try to process RDF documents using XML tools, since the same RDF graph can often be represented a number of different ways in RDF/XML. This is discussed in more detail in my answer to How to access OWL documents using XPath in Java?, but we can see the issue pretty quickly here. After adding some additional namespace declarations your data looks like this:

<rdf:RDF
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:Situation="https://stackoverflow.com/q/22170071/1281433/"
    xmlns:owl="http://www.w3.org/2002/07/owl#">
  <owl:Class/>
  <owl:Class/>
  <owl:ObjectProperty/>
  <Situation:Situation rdf:about="http://localhost/rdf#situa0">
    <Situation:composedBy></Situation:composedBy>
  </Situation:Situation>
</rdf:RDF>

The same RDF graph can be serialized like this, too:

<rdf:RDF
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:Situation="https://stackoverflow.com/q/22170071/1281433/"
    xmlns:owl="http://www.w3.org/2002/07/owl#" > 
  <rdf:Description rdf:nodeID="A0">
    <rdf:type rdf:resource="http://www.w3.org/2002/07/owl#Class"/>
  </rdf:Description>
  <rdf:Description rdf:about="http://localhost/rdf#situa0">
    <rdf:type rdf:resource="https://stackoverflow.com/q/22170071/1281433/Situation"/>
    <Situation:composedBy></Situation:composedBy>
  </rdf:Description>
  <rdf:Description rdf:nodeID="A1">
    <rdf:type rdf:resource="http://www.w3.org/2002/07/owl#ObjectProperty"/>
  </rdf:Description>
  <rdf:Description rdf:nodeID="A2">
    <rdf:type rdf:resource="http://www.w3.org/2002/07/owl#Class"/>
  </rdf:Description>
</rdf:RDF>

If you're looking for a Situation:Situation element, you'll find one in the first serialization, but not the second, even though they're the same RDF graph.

You could probably use a SPARQL query to get what you're looking for. The typical implementation of describe queries might do what you want. E.g., the very simple query

describe <http://localhost/rdf#situa0>

produces this result (in RDF/XML):

<rdf:RDF
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:Situation="https://stackoverflow.com/q/22170071/1281433/"
    xmlns:owl="http://www.w3.org/2002/07/owl#">
  <Situation:Situation rdf:about="http://localhost/rdf#situa0">
    <Situation:composedBy></Situation:composedBy>
  </Situation:Situation>
</rdf:RDF>

Alternatively, you could ask for everything that has the type Situation:Situation:

prefix s: <https://stackoverflow.com/q/22170071/1281433/>
describe ?situation where {
  ?situation a s:Situation .
}
<rdf:RDF
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:s="https://stackoverflow.com/q/22170071/1281433/"
    xmlns:owl="http://www.w3.org/2002/07/owl#">
  <s:Situation rdf:about="http://localhost/rdf#situa0">
    <s:composedBy></s:composedBy>
  </s:Situation>
</rdf:RDF>

The important point here is to use an appropriate query language for the type of data that you have. You have RDF, which is a graph-based data representation. An RDF graph is a set of triples. Your data is five triples:

_:BX2D6970b66dX3A1448f4e1bcfX3AX2D7ffe <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#Class> .
<http://localhost/rdf#situa0> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <https://stackoverflow.com/q/22170071/1281433/Situation> .
<http://localhost/rdf#situa0> <https://stackoverflow.com/q/22170071/1281433/composedBy> "" .
_:BX2D6970b66dX3A1448f4e1bcfX3AX2D7ffd <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#ObjectProperty> .
_:BX2D6970b66dX3A1448f4e1bcfX3AX2D7fff <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#Class> .

In the Turtle serialization, the graph is:

@prefix owl:   <http://www.w3.org/2002/07/owl#> .
@prefix rdf:   <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix Situation: <https://stackoverflow.com/q/22170071/1281433/> .

[ a       owl:Class ] .

<http://localhost/rdf#situa0>
        a                     Situation:Situation ;
        Situation:composedBy  "" .

[ a       owl:Class ] .

[ a       owl:ObjectProperty ] .

You should use SPARQL (the standard RDF query language) or an RDF-based API for extracting data from RDF documents.

Community
  • 1
  • 1
Joshua Taylor
  • 84,998
  • 9
  • 154
  • 353
1

There are several ways you can parse the file without actually having the namespaces in your XML file. You can add them to your root node directly:

rootElement.setAttribute("xmlns:owl", "http://www.w3.org/2002/07/owl");
rootElement.setAttribute("xmlns:Situation", "http://localhost/Situation.owl#");

or you can configure a namespace resolver:

xPath.setNamespaceContext(new NamespaceContext() {
    public String getNamespaceURI(String prefix) {
        if (prefix.equals("rdf")) {
            return "http://www.w3.org/1999/02/22-rdf-syntax-ns#";
        } else if (prefix.equals("owl")) {
            return "http://www.w3.org/2002/07/owl";
        } else if (prefix.equals("Situation")) {
            return "http://localhost/Situation.owl#";
        } else {
            return XMLConstants.NULL_NS_URI;
        }
    }
    public String getPrefix(String namespaceURI) { return null;}
    public Iterator getPrefixes(String namespaceURI) { return null;}
});

You can also use a namespace-independent XPath expression:

xPath.evaluate("/*[local-name()='RDF']/*[local-name()='Situation']", doc, XPathConstants.NODE);

But it seems that you are having errors with the transformer. It's not finding the rdf namespace. That's weird. Perhaps it's not being correctly copied to the result node since it's declared in an attribute and for some reason the parser did not copy it (I'm just guessing). There might be a nicer way to fix that, but you could also explicitly add the namespace prefix to the result node before sending it to the transformer. Cast it to Element and then use addAttribute:

Element result = (Element) xPath.evaluate("/RDF/Situation", doc, XPathConstants.NODE);
result.setAttribute("xmlns:rdf", "http://www.w3.org/1999/02/22-rdf-syntax-ns#");
helderdarocha
  • 23,209
  • 4
  • 50
  • 65
  • Try just adding the attribute (the two last lines I posted above). It should at least cause a different error to show up in Transformer. – helderdarocha Mar 04 '14 at 14:51