How do I get The quick brown fox.
in the following document:
<a>
<b>
Hello
<c/>
World
</b>
The quick brown fox.
</a>
How do I get The quick brown fox.
in the following document:
<a>
<b>
Hello
<c/>
World
</b>
The quick brown fox.
</a>
As discussed in comments, when dealing with mixed content is important to know whether white space only text nodes are being preserved or stripped.
Universal solution:
/a/text()[normalize-space()][1]
Meaning: first not white space only text node child of a
root element
Other posibility:
/a/text()[last()]
Meaning: last text node child of a
root element
text()
selects all child text nodes of the current node, so /a/text()
is the way to go. Just remember that you may need to do some string manipulation on the results, because an XML like this one:
<a>
<b>
Hello
<c/>
World
</b>
The quick <!--comment--> brown fox.
</a>
will return two text nodes ("the quick" and "brown fox"). Also, the text values will contain whitespace (e.g. the newline after </b>
and before "the").
you can start with /a/text() This will get you just the node texts not the tags.