0

If I have an XML document like below:

<foo>
   <foo1>Foo Test 1</foo1>
   <foo2>
       <another1>
           <test10>This is a duplicate</test10>
       </another1>
   </foo2>
   <foo2>
       <another1>
           <test1>Foo Test 2</test1>
       </another1>
   </foo2>
   <foo3>Foo Test 3</foo3>
   <foo4>Foo Test 4</foo4>
</foo>

How do I get the XPath of <test1> for example? So the output should be something like: foo/foo2[2]/another1/test1

I'm guessing the code would look something like this:

public String getXPath(Document document, String xmlTag) {
   String xpath = "";
   ...
   //Get the node from xmlTag
   //Get the xpath using the node
   return xpath;
}

Let's say String XPathVar = getXPath(document, "<test1>");. I need to get back an absolute xpath that will work in the following code:

XPath xpath = XPathFactory.newInstance().newXPath(); 
XPathExpression xpr = xpath.compile(XPathVar); 
xpr.evaluate(Document, XPathConstants.STRING); 

But it can't be a shortcut like //test1 because it will also be used for meta data purposes.

When printing the result out via:

System.out.println(xpr.evaluate(Document, XPathConstants.STRING));

I should get the node's value. So if XPathVar = foo/foo2[2]/another1/test1 then I should get back:

Foo Test 2 and not This is a duplicate

J. Steen
  • 15,470
  • 15
  • 56
  • 63
  • How about `return "//" + xmlTag`? – JLRishe Jan 21 '13 at 09:54
  • @ThreaT you use xpath to get a node from a document, if you have the node in the first place, why would you turn the node into an xpath that can only be used to get the node again? – Nick Holt Jan 21 '13 at 10:07
  • 1
    How about `return "(//" + xmlTag + ")[1]";`? – JLRishe Jan 21 '13 at 10:09
  • That _is_ the node's absolute path in the XML document. Are you saying you need the full path, including all parent nodes? If so, why do you need that? – JLRishe Jan 21 '13 at 11:18
  • 1
    possible duplicate of [Generate/get xpath from XML node java](http://stackoverflow.com/questions/4746299/generate-get-xpath-from-xml-node-java) – ChrisF Jan 22 '13 at 14:58

2 Answers2

1

You don't 'get' an xpath in the same way you don't 'get' sql.

An xpath is a query you write based on your understanding of an xml document or schema, just as sql is a query you write based on your understanding of a database schema - you don't 'get' either of them.

I would be possible to generate xpath statements from the DOM simply by walking back up the nodes from a given node, though to do this generically enough, taking into account attribute values on each node, would make the resulting code next to useless. For example (which comes with a warning that this will find the first node that has a given name, xpath is much more that this and you may as well just use the xpath //foo2):

import java.io.ByteArrayInputStream;
import javax.xml.parsers.DocumentBuilderFactory;

import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.Node;

public class XPathExample 
{
  private static String getXPath(Node root, String elementName)
  {
    for (int i = 0; i < root.getChildNodes().getLength(); i++)
    {
      Node node = root.getChildNodes().item(i);

      if (node instanceof Element)
      {
        if (node.getNodeName().equals(elementName))
        {
          return "/" + node.getNodeName();
        }
        else if (node.getChildNodes().getLength() > 0)
        {
          String xpath = getXPath(node, elementName);
          if (xpath != null)
          {
            return "/" + node.getNodeName() + xpath;
          }
        }
      }
    }

    return null;
  }

  private static String getXPath(Document document, String elementName)
  {
    return document.getDocumentElement().getNodeName() + getXPath(document.getDocumentElement(), elementName);
  }

  public static void main(String[] args) 
  {
    try 
    {
      Document document = DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(
        new ByteArrayInputStream(
          ("<foo><foo1>Foo Test 1</foo1><foo2><another1><test1>Foo Test 2</test1></another1></foo2><foo3>Foo Test 3</foo3><foo4>Foo Test 4</foo4></foo>").getBytes()
        )
      );

      String xpath = "/" + getXPath(document, "test1");
      System.out.println(xpath);        

      Node node1 = (Node)XPathFactory.newInstance().newXPath().compile(xpath).evaluate(document, XPathConstants.NODE);
      Node node2 = (Node)XPathFactory.newInstance().newXPath().compile("//test1").evaluate(document, XPathConstants.NODE);

      //This evaluates to true, hence you may as well just use the xpath //test1.
      System.out.println(node1.equals(node2));
    }
    catch (Exception e) 
    {
      e.printStackTrace();
    }
  }
}

Likewise you could write an XML transformation that turned an xml document into a series of xpath statements but this transformation would be more complicated that writing the xpath in the first place and so largely pointless.

Nick Holt
  • 33,455
  • 4
  • 52
  • 58
  • @ThreaT added an example, but as I said in the text - this will find the first node that has a given name, xpath is much more that this and you may as well just use the xpath `//foo2` if this is what you want to do. – Nick Holt Jan 21 '13 at 10:30
  • In your edited example in the question `//test1` will find the `test1` node. The `//` means select nodes in the document from the current node that match the selection no matter where they are - see http://www.w3schools.com/xpath/xpath_syntax.asp. – Nick Holt Jan 21 '13 at 10:43
  • Sorry the example call should be `String xpath = "//" + getXPath(document.getDocumentElement(), "foo2"));` – Nick Holt Jan 21 '13 at 10:45
  • Yes, but my point is that the xpath is a means to an end - you use it to get a node. Parsing the DOM in the general way that the code example does to produce the xpath /foo/foo2/another1/test1 is just the same a using //test1 and passing the `QName` `XPathConstants.NODE`. – Nick Holt Jan 21 '13 at 11:05
  • There shouldn't be a StackOverflowError - the method is recursive but exits when either the `root` matches the name you're looking for or all of the `root`'s children have been evaluated. – Nick Holt Jan 21 '13 at 11:07
  • 1
    When I run the code it produces - `/foo2/another1/test1` - I've replaced the code with the full example that I'm running. – Nick Holt Jan 21 '13 at 11:17
  • I've added some code to show how //test1 and /foo/foo2/another1/test1 can be used to get the same node from the DOM, hence making the code that parses the DOM redundant. – Nick Holt Jan 21 '13 at 11:44
0

How's this:

private static String getXPath(Document root, String elementName)
{
  try{
      XPathExpression expr = XPathFactory.newInstance().newXPath().compile("//" + elementName);
      Node node = (Node)expr.evaluate(root, XPathConstants.NODE);

      if(node != null) {
          return getXPath(node);
      }
  }
  catch(XPathExpressionException e) { }

  return null;
}

private static String getXPath(Node node) {
    if(node == null || node.getNodeType() != Node.ELEMENT_NODE) {
        return "";
    }

    return getXPath(node.getParentNode()) + "/" + node.getNodeName();
}

Note that this is first locating the node (using XPath) and then using the located node to get its XPath. Quite the roundabout approach to get a value you already have.

Working ideone example: http://ideone.com/EL4783

JLRishe
  • 99,490
  • 19
  • 131
  • 169
  • Found and fixed four of them. How is it now? – JLRishe Jan 21 '13 at 13:27
  • Just added a link to ideone. – JLRishe Jan 21 '13 at 13:35
  • Would you please explain _why_ you want these particular XPaths? `(//test1)[1]` **will** locate the first `test1` in the document, so if you must have these particular XPaths, at least tell us why. Are you having us do a school assignment for you? – JLRishe Jan 22 '13 at 07:39
  • Ok, so should the XPath only include the [1]s if there is ambiguity, based on the source XML, or should it include [1]s for every step of the path? – JLRishe Jan 22 '13 at 07:52
  • That doesn't answer my question at all. You just told me the output isn't "100% correct" because it doesn't include [x]es, so how do you want them handled? The current output from my function _will_ work work to locate the first matching node using `xpr.evaluate(Document, XPathConstants.STRING);`, so **if you have specific requirements, please define them**. You have yet to do so once since you posted your question yesterday. – JLRishe Jan 22 '13 at 08:22
  • For one thing, `foo/foo2/another1/test[1]` _is_ a relative XPath and `(//test1)[1]` is not. But setting that aside, yes, that XPath would work, but so would `/foo/foo2/another1/test1`, or `/foo[1]/foo2/another1/test1`, or `/foo[1]/foo2[1]/another1[1]/test[1]`. In fact, all 4 of these (including the example you just gave) would have exactly the same result when passed to `System.out.println(xpr.evaluate(Document, XPathConstants.STRING));`, so what are your criteria for when a `[1]` is necessary and when it is not? – JLRishe Jan 22 '13 at 08:32
  • I can't edit your question because I don't know what you want. Every time I answer your question you change the requirements. First you asked for `//test1`, then you asked for `/foo/foo2/another1/test1`, and now you're asking for `/foo/foo2/another1/test1[1]`, and you haven't clarified when a part of the path needs a `[1]` and when it doesn't. – JLRishe Jan 22 '13 at 08:39