1

I have an XML of the form:

<?xml version="1.0" encoding="UTF-8"?>
<semseg:Envelope xmlns:semseg="http://a-random-URL" xmlns="http://another-random-URL">
    <semseg:subject>Subject</semseg:subject>
    <semseg:Sender>
        <semseg:name>Me</semseg:name>
    </semseg:Sender>
    <Triangle>
        <Triangle time='2017-11-29'>
            <Triangle key='a' value='b'/>
            <Triangle key='c' value='d'/>
            <Triangle key='e' value='f'/>
            <Triangle key='g' value='h'/>
        </Triangle>
    </Triangle>
</semseg:Envelope>

And I am trying to retrieve the element <Triangle> (not <Triangle time='2017-11-29'> - element names are a bit repetitive in this XML) using XPath. Part of the code is the following:

DocumentBuilderFactory documentBuilderFactory = DocumentBuilderFactory.newInstance();
documentBuilderFactory.setNamespaceAware(true);
DocumentBuilder documentBuilder = documentBuilderFactory.newDocumentBuilder();
Document doc = documentBuilder.parse("file.xml");

XPathFactory xPathFactory = XPathFactory.newInstance();
XPath xPath = xPathFactory.newXPath();
XPathExpression xpr = xPath.compile("/semseg:Envelope/Triangle");
NodeList nodes = (NodeList)xpr.evaluate(doc, XPathConstants.NODESET);

I have tried many possible combinations for the XPath without any luck unfortunately since no elements are selected. Nevertheless, testing the same XPath with this online XPath checker and the same XML file yields exactly the results I am looking for. It evens works for attribute retrieval using XPaths like

/semseg:Envelope/Triangle/Triangle/@time

Seems like there is a problem with the namespace prefixes. Parsing XMLs without any namespace prefixes works just fine with XPath.

mzjn
  • 48,958
  • 13
  • 128
  • 248
Niko
  • 616
  • 4
  • 20
  • 1
    Have you had a look at https://stackoverflow.com/questions/7020638/xpath-parsing-of-namespace-declarations and https://stackoverflow.com/questions/3939636/how-to-use-xpath-on-xml-docs-having-default-namespace ? – GPI Sep 19 '18 at 13:35
  • I think you can try this, " //Triangle[not(@*)]" as your xpath – reflexdemon Sep 19 '18 at 13:36
  • @reflexdemon this does not work unfortunately – Niko Sep 19 '18 at 13:45
  • @GPI I cannot really use a namespace context since the prefix is only applicable to some of the elements in the XML. You can see that `Envelope` does have such a prefix but `Triangle` does not. – Niko Sep 19 '18 at 13:48

2 Answers2

2

Your XML input actually has two namepsaces.

Default namespace

The first is the default one, declared as such :

<semseg:Envelope ... xmlns="http://another-random-URL" ...

Being the default one, any XML element that has no namespace on it belongs to this default namespace.

semseg namespace

Defined as such :

<semseg:Envelope xmlns:semseg="http://a-random-URL" ...

Meaning every XML element prefixed with semseg belongs to this namespace.

Translating your requirements

So you're aiming at an XPath expression that will target

  • any Triangle element (no prefix, so that actually translates to any Triangle element from the http://another-random-URL namespace).
  • That is a direct child of a root semseg:Enveloppe element (that actually translates to a root element of the local name Enveloppe belonging to the "http://a-random-URL" namespace).

Programming this in XPath.

We create a NamespaceContext that describes what namespaces we are working with : I define prefixes that I wish to work with, and map them to the namespaces. These prefixes will be used by the XPath engine. I map :

  • The main prefix to the http://a-random-URL namespace
  • The secondary prefix to the http://another-random-URL namespace

Using this mapping that I defined, I can translate your requirement to this XPath :

/main:Envelope/secondary:Triangle

And this works :

XPathFactory xPathFactory = XPathFactory.newInstance();
XPath xPath = xPathFactory.newXPath();
xPath.setNamespaceContext(new NamespaceContext() {
    @Override
    public String getNamespaceURI(String prefix) {
        if ("main".equals(prefix)) {
            return "http://a-random-URL";
        }
        if ("secondary".equals(prefix)) {
            return "http://another-random-URL";
        }
        return null;
    }
    @Override
    public String getPrefix(String namespaceURI) {
        // This should be implemented but I'm lazy and this sample works without it
        return null;
    }

    @Override
    public Iterator getPrefixes(String namespaceURI) {
        // This should be implemented but I'm lazy and this sample works without it
        return null;
    }
});
XPathExpression xpr = xPath.compile("/main:Envelope/secondary:Triangle");
NodeList nodes = (NodeList)xpr.evaluate(doc, XPathConstants.NODESET);
System.out.println(nodes.getLength());

Outputs :

1

Here I have implemented a really dumb namespace context, but if you hava Spring framework, CXF, guava (I think), or other frameworks at reach, you often have something like SimpleNamespaceContext or MapBasedNamespaceContext that are probably better options.

GPI
  • 9,088
  • 2
  • 31
  • 38
1

This is working for me

/\*[local-name()='Envelope']/\*[local-name()='Triangle']/\*[local-name()='Triangle']/@time

Niko
  • 616
  • 4
  • 20
  • This does work but I was looking for something more elegant and concise that's why I upvoted it but did not mark it as an accepted answer . Thanks a lot though! – Niko Sep 19 '18 at 14:03