2

In this xpath:

/A/B[C='hello']

Is C="hello" some kind of syntactic shortcut for C[text()='hello']? Is it documented anywhere?

Edit: Okay, I discovered one difference: C= returns all the text nodes in C and C's children, while C[text()= returns only the text nodes in C.

Now, suppose I have the XML:

<root>

  <A>
    <B>
      <C>hello<E>EEE</E>world</C>
      <D>world</D>
    </B>

    <B>
      <C>goodbye</C>
      <D>mars</D>
    </B>
  </A>

</root>

How would I choose the B node containing the first C node using the syntax C[text()=? I can get the B node using the C= syntax like this:

/root/A/B[C="helloEEEworld"]

But this doesn't work:

/root/A/B[C[text()="helloworld"]]

nor do these:

/root/A/B[C[text()="hello world"]]
/root/A/B[C[text()="helloEEEworld"]]

Hmmm...this works:

/root/A/B[C[text()="hello"]]

Why is that? Does text() only return the first text node? According to the W3C, text() returns all text node children of the context node.

7stud
  • 46,922
  • 14
  • 101
  • 127

2 Answers2

2

text() really returns all text node children as list of nodes

When you use /root/A/B[C[text()="hello"]] you mean fetch B node with C child that any direct child node is equal to "hello".

In the same way you can match it by :

/root/A/B[C[text()="world"]]

or explicitly specify that you want to get node by exact first or second direct child text node:

/root/A/B[C[text()[1]="hello"]]
/root/A/B[C[text()[2]="world"]]

If you want to match required node by its complete text content you can use

/root/A/B[C[.="helloEEEworld"]]

or

/root/A/B[C="helloEEEworld"]
Andersson
  • 51,635
  • 17
  • 77
  • 129
  • [This ticket](https://stackoverflow.com/questions/38240763/xpath-difference-between-dot-and-text) might also be useful for you – Andersson Jan 22 '18 at 13:01
1

C in the predicate expression [C='hello'] returns all C elements that is direct child of context element which is B. So the entire predicate is a boolean expression that contains comparison between a node-set and a string (notice that element is a type of node in XPath data model), and behavior of this case is documented in the spec as follows :

If one object to be compared is a node-set and the other is a string, then the comparison will be true if and only if there is a node in the node-set such that the result of performing the comparison on the string-value of the node and the other string is true. If one object to be compared is a node-set and the other is a boolean, then the comparison will be true if and only if the result of performing the comparison on the boolean and on the result of converting the node-set to a boolean using the boolean function is true. [source]

C='hello' in /A/B[C='hello'] will be evaluated to true if any of the C elements, after converted to string, equals 'hello'. So it is more of a shortcut for C[string()='hello'] if you will.

"Hmmm...this works:

/root/A/B[C[text()="hello"]]

Why is that? Does text() only return the first text node? According to the W3C, text() returns all text node children of the context node."

Instead of the first text node, text() in this context returns all direct child text nodes. This is because child:: is the default axis in XPath. Contrasts your XPath with the equivalent verbose version of it :

/child::root/child::A/child::B[child::C[child::text()="hello"]]
Community
  • 1
  • 1
har07
  • 88,338
  • 12
  • 84
  • 137
  • Thanks for the response. 1) Okay, so the `C=` syntax requires the xpath `/root/A/B[C="helloEEEworld"]` to find the match. Apparently the C node's *string-value* is all the text within its borders concatenated together. 2) Why doesn't the xpath `/root/A/B[C[text()="helloworld"]]` match anything? As far as I can tell, the two direct child text nodes of C are `"hello"` and `"world"`. Ah, okay. In 1), the only node in the nodeset is C. In 2), there are two nodes in the nodeset, namely the two text nodes.... – 7stud Jan 22 '18 at 14:11
  • ...And the rule is convert each node in the nodeset to a string, then compare the result to what's on the right hand side of the `=` sign. In 1), when you convert the only node in the nodeset (namely the C node) to a string, you get `"helloEEEworld"`. In 2), when you convert the first node in the nodeset, namely the "hello" text node, you get "hello", which is compared to what is on the rhs of the `=` sign, then the second node in the nodeset, namely the "world" text node, is converted to the string "world", and then "world" is compared to what is on the rhs of the `=` sign. – 7stud Jan 22 '18 at 14:14