1

could any one provide an example of extracting all the elements with their attributes and values from an xml file using xpath in java?

Thanks

Lucy
  • 471
  • 4
  • 12
  • 28
  • possible duplicate of [How to read XML using XPath in Java](http://stackoverflow.com/questions/2811001/how-to-read-xml-using-xpath-in-java) – daveb Apr 10 '12 at 11:21

2 Answers2

7

I wrote this few years back for my team. Would be helpful.

What is an xPath?

  1. XPath is a language for finding information in an XML document.
  2. XPath is a syntax for defining parts of an XML document.
  3. XPath uses path expressions to navigate in XML documents.
  4. XPath contains a library of standard functions.
  5. XPath is a major element in XSLT.
  6. XPath is a W3C recommendation.

In XPath, there are seven kinds of nodes: element, attribute, text, name-space, processing-instruction, comment, and document (root) nodes. XML documents are treated as trees of nodes. The root of the tree is called the document node (or root node).

Consider the following Xml document.

<information>
    <person id="1">
        <name>Tito George</name>
        <age>25</age>
        <gender>Male</gender>
        <dob>
             <date>25</date>
             <month>october</month>
             <year>1983</year>
        </dob>
    </person>


     <person id="2">
        <name>Kumar</name>
        <age>32</age>
        <gender>Male</gender>
        <dob>
             <date>28</date>
             <month>january</month>
             <year>1975</year>
        </dob>
    </person>


    <person id="3">
        <name>Deepali</name>
        <age>25</age>
        <gender>Female</gender>
        <dob>
             <date>17</date>
             <month>january</month>
             <year>1988</year>
        </dob>
    </person>

</information>

Getting information from the Document

DocumentBuilderFactory domFactory = DocumentBuilderFactory.newInstance();
//Getting the instance of DocumentBuilderFactory 
domFactory.setNamespaceAware(true);
//true if the parser produced will provide support for XML namespaces; 
DocumentBuilder builder = domFactory.newDocumentBuilder();
//Creating document builder
Document doc = builder.parse("C:\\JavaTestFiles\\persons.xml");
XPath xpath = XPathFactory.newInstance().newXPath();
//getting instance of xPath
expr = xpath.compile("//@id");
result = expr.evaluate(doc, XPathConstants.NODESET);
NodeList nodes = (NodeList) result;
 for (int i = 0; i < nodes.getLength(); i++) {
     System.out.println(nodes.item(i).getNodeValue());
}

The line above in red is the one which is used for compiling xPath expression and //@id is the actual expression . The expression //@id will return and the values of attribute id in the document. ie. out put of the program will be 1 2 and 3. In the below table you can find the various expressions that can be used in this document.

Two important statements in the above code snippet is

  • expr = xpath.compile("//@id"); --> This one compiles the expression. if not compilable this method will throw XPathExpressionException.
  • expr.evaluate(doc, XPathConstants.NODESET); --> Evaluate an XPath expression in the specified context and return the result as the specified type. In this the second argument defines what the method is going to return(returnType ). If returnType is not one of the types defined in XPathConstants ( NUMBER, STRING, BOOLEAN, NODE or NODESET) then an IllegalArgumentException is thrown.

Basically: An XML document is a tree-structured (hierarchical) collection of nodes. As with a hierarchical directory structure, it is useful to specify a path that points to a particular node in the hierarchy (hence the name of the specification: XPath).

In fact, much of the notation of directory paths is carried over intact:

  • The forward slash (/) is used as a path separator.
  • An absolute path from the root of the document starts with a /.
  • A relative path from a given location starts with anything else.
  • A double period (..) indicates the parent of the current node.
  • A single period (.) indicates the current node.

Information

  • //@id --> Selects all attributes that are named id
  • //@* --> Selects all attribute node in the document
  • //@id='1' --> Tests if the node with attribute id = '1' is present in the document. if present the statement will evaluate to true. In this case XPathConstants.BOOLEAN should be used as the return type in evaluate method.
  • /information/person [age='24']name/text() or
    //person[age='24'] name/text() --> Returns 'Kumar'.. Let us split the query first: /information/person[age='24']/name/text() Part 1: Searches for the node 'person' which is having element 'age' = 24 Part 2: Get the element 'name' of that node Part 3: text() -- is an xPath function that will return the text node of the element 'name' Note: Here, information is the root node, if we are starting from the root node one slash is enough, i.e. it is an absolute path. if we are starting from child node use have to use double slash '//' i.e. it is a relative path.
  • //person/dob[year>'1978'][year<1985]/../name/text() --> This expression is searching for persons whose YOB is in between 1978 and 1985. Check the text marked in red. This is because element year is not a direct child for person rather it is a sibling or in other words direct parent of year is node . So we need to go one level up for getting element 'name'.
  • //person/dob[year>'1978'][year<1985]/../@id --> This will return the id of the node which satisfies the above condition. Note: No need to call text() method for getting the attribute values
  • //person[age='25']//dob[date=25]/../name/text() --> This expression will return the name of the person whose age = 25 and date = 25.
  • /information/person[1] /name/text() Searches for the name of the first person node.
  • /information/person/ dob/child::/text() --> This will return all the child nodes of dob. We can also write this like child::information/child::person/child::dob/child::/text()
titogeo
  • 2,156
  • 2
  • 24
  • 41
  • I don't see any answer of the specific question asked: What is an XPath expression that selects all elements in an XML document? Besides this, this answer, while being a brave attempt to describe XML/XPath in one page, is rather imprecise and, if used alone, could lead a reader with incorrect perception of what XML and XPath are. – Dimitre Novatchev Apr 10 '12 at 13:05
4

Use this XPath expression "//*" in this way

Document doc = ... // the document on which apply XPath
XPathExpression xp = XPathFactory.newInstance().newXPath().compile("//*");
NodeList elements = (NodeList) xp.evaluate(doc, XPathConstants.NODESET);

It return to you all the elements at any level.

dash1e
  • 7,677
  • 1
  • 30
  • 35
  • @user1210237: The answer you have accepted actually doesn't provide a solution to your problem. The answer by dash1e does provide the solution you asked for. I would strongly recommend that you accept *this* answer. The one currently accepted is an attempt for a mini introduction into XML/XPath. It is like a snapshot made from a thousand miles -- I would recommend using a more precise and systematic XML/XPath reference materials. – Dimitre Novatchev Apr 10 '12 at 13:08
  • Hi , I just changed the XPathExpression to XPathExpression xp = XPathFactory.newInstance().newXPath().compile("//*/text()"); and it worked fine – Lucy Apr 10 '12 at 14:31