I have a HTML file:
<html>
<p>somestr
<sup>1</sup>
anotherstr
</p>
</html>
I would like to extract the text as:
somestr1anotherstr
but I can't figure out how to do it. I have written a to_sup()
function that converts numeric strings to superscript so the closest I get is something like:
for i in doc.xpath('.//p/text()|.//sup/text()'):
if i.tag == 'sup':
print to_sup(i),
else:
print i,
but I ElementStringResult
doesn't seem to have a method to get the tag name, so I am a bit lost. Any ideas how to solve it?