-1

For example this HTML

<div>
    <span></span> I want to find this <b>this works ok</b>.
</div>

I want to find a DIV with I want to find this in it and then grab the whole text inside that DIV including child elements

My XPATH, //*[contains(text(), 'I want to find this')] does not work at all.

If I do this //*[contains(text(), 'this works')] it works but I want to find any DIV based on I want to find this text

However, if I remove the <span></span> from that HTML, it works, why is that?

Umair Ayub
  • 19,358
  • 14
  • 72
  • 146
  • Update title as *"Why this Xpath not working?"* is not informative at all. Also note that `//*[contains(text(), 'this works')]` doesn't work actually. It can only return `b`, but not `div` – Andersson Oct 02 '17 at 10:07
  • Possible duplicate of [Testing text() nodes vs string values in XPath](https://stackoverflow.com/questions/34593753/testing-text-nodes-vs-string-values-in-xpath) – kjhughes Oct 02 '17 at 12:05
  • @ Umair, If you like to have a solution using css selector then there is one for this job. – SIM Oct 02 '17 at 15:57
  • @Shahin I actually did with contains selector – Umair Ayub Oct 02 '17 at 15:58

3 Answers3

2

text() only gets the text before the first inner element. You can replace it with . to use the current node to search.

//div[contains(., 'I want to find this')]

This will search in a string concatenation of all text nodes inside the current node.

To grab all text you can use node.itertext() to iterate all inner texts if you are using lxml:

from lxml import etree

html = """
<div>
    <span></span> I want to find this <b>this works ok</b>.
</div>
"""

root = etree.fromstring(html, etree.HTMLParser())
for div in root.xpath('//div[contains(., "I want to find this")]'):
    print(''.join([x for x in div.itertext()]))
# =>    I want to find this this works ok.
CtheSky
  • 2,484
  • 14
  • 16
  • 1
    Caveat: it's not quite true that text() only gets the first text node. Rather, under XPath 1.0, the contains() function ignores all but the first node in the supplied argument. Under XPath 2.0, the contains() function will throw an error if the first argument is a list containing more than one item. But the solution is correct for all XPath versions. – Michael Kay Oct 02 '17 at 09:55
  • 1
    And note the general principle: 95% of the time when people write `text()`, they should be writing `.` instead. – Michael Kay Oct 02 '17 at 09:56
0

Try using //*[text()=' I want to find this '] , this will select the div tag and then for text you can use the getText() method to get the text

akash
  • 11
  • 2
  • 4
  • This checks for exact text, I want to check if a DIV contains that text, because in my case, there can a long string like `I want to find this bla bla` too, in that case, your answer will not work – Umair Ayub Oct 02 '17 at 09:28
0

You can try Replace text() with string():

//div[contains(string(), " I want to find this")]
Or, you can check that span's following text sibling contains the text:

//div[contains(span/following-sibling::text(), " I want to find this")] 
Zakaria Shahed
  • 2,589
  • 6
  • 23
  • 52