1

I would like to retrieve the literal text from an xpath using lxml. Unfortunately, it converts entityrefs as shown below:

from lxml import etree
tree = etree.HTML('<html><body><div>&amp;</div></body></html>')
text = tree.xpath('/html[1]/body[1]/div[1]')[0].text
print(text)

>>> &

Is there some way to prevent the conversion or am I stuck having to back-convert it?

Cory Nezin
  • 1,551
  • 10
  • 22
  • 1
    Similar questions: https://stackoverflow.com/q/22263599/407651, https://stackoverflow.com/q/27687677/407651 – mzjn Apr 14 '21 at 05:35

0 Answers0