I'm using lxml to do HTML screen scraping and I need to select an element by text()
, in a similar way to what is done on another question with pure XML, however no matter what happens I'm getting invalid predicate errors. I've simplified it down to this example:
import lxml.html
sample_html = "<div><h2>test string</h2><h2>other string</h2></div>"
sample_tree = lxml.html.fromstring(sample_html)
sample_tree.findall('.//h2[text()="test string"]')
While this should be valid, I continually get the error:
File "<string>", line unknown
SyntaxError: invalid predicate
Any hints on how to properly get lxml to select an element by text()
when parsing HTML?