I have an xpath string //*[normalize-space() = "some sub text"]/text()/..
which works fine if the text I am finding is in a node which does not have multiple text sub nodes, but if it does then it won't work, so I am trying to combine it with contains()
as follows: //*[contains(normalize-space(), "some sub text")]/text()/..
which does work, but it always returns the body
and html
tags as well as the p
tag which contains the text. How can I change it so it only returns the p
tag?

- 816
- 2
- 14
- 29
-
Share HTML code sample for the same along with current and desired output – Andersson Nov 09 '18 at 17:47
1 Answers
It depends exactly what you want to match.
The most likely scenario is that you want to match some text
if it appears anywhere in the normalized string value of the element, possibly split across multiple text nodes at different levels: for example any of the following:
<p>some text</p>
<p>There was some text</p>
<p>There was <b>some text</b></p>
<p>There <b>was</b> some text</p>
<p>There was <b>some</b> <!--italic--> <i>text</i></p>
<p>There was <b>some</b> text</p>
If that's the case, then use //p[contains(normalize-space(.), "some text")]
.
As you point out, using //*
with this predicate will also match ancestor elements of the relevant element. The simplest way to fix this is by using //p
to say what element you are looking for. If you don't know what element you are looking for, then in XPath 3.0 you could use
innermost(//*[contains(normalize-space(.), "some text")])
but if you have the misfortune not to be using XPath 3.0, then you could do (//*[contains(normalize-space(.), "some text")])[last()]
, though this doesn't do quite the same thing if there are multiple paragraphs with the required content.
If you don't want to match all of the above, but want to be more selective, then you need to explain your requirements more clearly.
Either way, use of text()
in a path expression is generally a code smell, except in the rare cases where you want to select text in an element only if it is not wrapped in other tags.

- 156,231
- 11
- 92
- 164