2

I'm using lxml to do HTML screen scraping and I need to select an element by text(), in a similar way to what is done on another question with pure XML, however no matter what happens I'm getting invalid predicate errors. I've simplified it down to this example:

import lxml.html
sample_html = "<div><h2>test string</h2><h2>other string</h2></div>"
sample_tree = lxml.html.fromstring(sample_html)
sample_tree.findall('.//h2[text()="test string"]')

While this should be valid, I continually get the error:

  File "<string>", line unknown
SyntaxError: invalid predicate

Any hints on how to properly get lxml to select an element by text() when parsing HTML?

Community
  • 1
  • 1
Pridkett
  • 4,883
  • 4
  • 30
  • 47

1 Answers1

5

The expression itself is valid, but you have to use the .xpath() method instead:

sample_tree.xpath('.//h2[text()="text string"]')

Note that you may also use . in place of text() in this case:

.//h2[. = "text string"]
Community
  • 1
  • 1
alecxe
  • 462,703
  • 120
  • 1,088
  • 1,195