Prevent Python lxml from replacing entityref with actual character

Asked Apr 12 '21 at 21:27

Active Apr 12 '21 at 21:27

Viewed 63 times

I would like to retrieve the literal text from an xpath using lxml. Unfortunately, it converts entityrefs as shown below:

from lxml import etree
tree = etree.HTML('<html><body><div>&amp;</div></body></html>')
text = tree.xpath('/html[1]/body[1]/div[1]')[0].text
print(text)

>>> &

Is there some way to prevent the conversion or am I stuck having to back-convert it?

asked Apr 12 '21 at 21:27

Cory Nezin

1,551
10
22

1

Similar questions: https://stackoverflow.com/q/22263599/407651, https://stackoverflow.com/q/27687677/407651 – mzjn Apr 14 '21 at 05:35

Prevent Python lxml from replacing entityref with actual character

0 Answers0