Let's say I have an XML file like this one:
<books>
<book>
<title>John is alive</title>
<abstract>
A man is found alive after having disappeared for 10 years.
</abstract>
<description>
<en> John disappeared 10 years ago. Lorem ipsum dolor sit amet ...</en>
<fr> Il y a 10 ans, John disparaissait. Lorem ipsum dolor sit amet ...</fr>
</description>
<notes>First book in the series, where the character is introduced</notes>
</book>
<book>
<title>The disappearance of John</title>
<abstract>
A prequel to the book "John is alive".
</abstract>
<description>
<en> He lead an ordinary life, but then ... lorem ipsum dolor sit amet ...</en>
<fr> Sa vie était tout à fait ordinaire, mais ... lorem ipsum dolor sit amet ...</fr>
</description>
<notes>Second book in the "John" series, but first in chronological order</notes>
</book>
</books>
My question is simple: how can I, using XPATH, get a collection of all nodes that contain the word John
?
Obviously, I can specify a series of nodes and that works fine:
(//title | //abstract | //description/* | //notes)[contains(lower-case(text()),"john")]
But if my XML grows (and it will!), with new elements being added at various levels in the structure, I don't want to constantly have to go back and adjust my XPATH.
What I fail to understand is why a generic statement like
//*[contains(lower-case(text()),"john")]
fails with this error message Required cardinality of first argument of lower-case() is one or zero
.
Yet, not all statements with an asterisk fail.
For instance:
//books/book/*[contains(lower-case(text()),"john")]
fails with the above error message
while
//books/book/*/*[contains(lower-case(text()),"john")]
succeeds and retrieves both the <en>
and <fr>
nodes from the first <description>
element
If it's not possible, fine, I will list all elements in my XPATH, but I still would like to get a clear understanding of the behavior of the *
selector in the context of a contains()
operation.