XPath: difference between "//[contains(.,'sometext')]" and "//[contains(text(),'sometext')]"

Question

I want to clearly understand what is the difference between the following XPath expressions "//*[contains(.,'sometext')]" and "//*[contains(text(),'sometext')]".
From this great answer I understand, that text() returns a set of individual nodes, while . in a predicate evaluates to the string concatenation of all text nodes.
OK, but when I'm using [contains(.,'sometext')] or [contains(text(),'sometext')] this should return the same amount of elements matching those XPaths since here we checking for nodes containing someText content in itself or in some of their children. Right? And it doesn't matter if we are checking whether any of the text nodes of an element contains sometext or string concatenation of all text nodes contains the sometext text. This should give the same amount of matches.
However if we test this for example on this page I see 104 matches for //*[contains(text(),'selenium')] XPath while //*[contains(.,'selenium')] XPath is giving 442 matches.
So, what causes this difference?

Are you sure that `//*[contains(text(),'selenium')]` is a proper syntax? This test http://xpather.com/OD0nAwzr throws error. — Alexey R., Nov 25 '22 at 09:32
Hm.. When I put any of these XPaths in dev tools -> Elements search filed both are returning matches. But not equal amount of matches, as I mentioned — Prophet, Nov 25 '22 at 09:36

Alexey R. · Accepted Answer · 2022-11-25T14:22:13.787

Let me share my understanding using this xml.

<test>
  <node>
    selenium
    <node2>
      selenium
    </node2>
  </node>
  <node>
    selenium
  </node>
</test>

First of all function text() returns list of node objects.

Function contains() takes two arguments where the first one is a string. So having this //*[contains(text(),'selenium')] would not always work. In XPath v2.0 It will fail when text() supplies several nodes to contains.

In my mentioned example white spaces before nodes are also text node:

This is why in my test your //*[contains(text(),'selenium')] query failed. Probably browsers have some work around for that to make things easier.

Now lets collapse that xml to get rid of that noise and look at the differences of approaches:

<test><node>selenium<node2>selenium</node2></node><node>selenium</node></test>

1. use text().

Here what https://www.freeformatter.com/xpath-tester.html returns:

Element='<node>selenium<node2>selenium</node2>
</node>'
Element='<node2>selenium</node2>'
Element='<node>selenium</node>'

Since //* defines all nodes within the tree here we have /test/node[1] that contains, also /test/node[1]/node2 and /test/node[2].

2. Now lets look at . case:

Now it returns:

Element='<test>
   <node>selenium<node2>selenium</node2>
   </node>
   <node>selenium</node>
</test>'
Element='<node>selenium<node2>selenium</node2>
</node>'
Element='<node2>selenium</node2>'
Element='<node>selenium</node>'

Why? because first of all /test is converted to seleniumseleniumselenium. Then /test/node[1] is converted to seleniumselenium, then /test/node[1]/node2 is converted to selenium and finally /test/node[2] is converted to selenium

So this makes the difference. Depending on how complex your nesting is, the results might show more or less significant difference between to approaches.

Thanks for your explanations Alexey. This makes some sense but it is still not clear enough for me. I will try to learn this subject more. — Prophet, Nov 25 '22 at 12:58
As about why xpather shows an error for `//*[contains(text(),'selenium')]` expression while browser treats it I think the explanation is here: https://stackoverflow.com/a/69915361/3485434 — Prophet, Nov 25 '22 at 13:00
@Prophet in short, when traversing a tree, for each current element `.` takes all the texts wherever they are under the particular node and joins them before test for `contains`. While `text()` takes only the text of current node without subordinates. — Alexey R., Nov 25 '22 at 13:14
@Prophet One more thing: it can produce extra results since when you have nodes `A`->`B`->`C` and both `B` and `C` contain text `selenium` while `A` does not, `A` being visited by xPath processor would be included if you use `.` but wouldn't be included if you use `text()`. because `A` itself does not contain text nodes on its own. — Alexey R., Nov 25 '22 at 13:20

score 1 · Answer 2 · answered Nov 25 '22 at 10:10

1

This thread exlains the difference between dot and text() pretty well: XPath: difference between dot and text()

answered Nov 25 '22 at 10:10

Alex Karamfilov

634
1
6
12

If you do not put any extra explanation but just refer to different question it is better to vote for marking this one as duplicate. – Alexey R. Nov 25 '22 at 10:13
Here you referred exactly to the question I mentioned in my question. So, no, this not answers my question. At least I don't see the answer to my question here explained there. – Prophet Nov 25 '22 at 12:55

XPath: difference between "//*[contains(.,'sometext')]" and "//*[contains(text(),'sometext')]"

2 Answers2

XPath: difference between "//[contains(.,'sometext')]" and "//[contains(text(),'sometext')]"