Prevent python libxml2 transformation

Asked Jan 16 '15 at 15:20

Active Aug 01 '16 at 14:10

Viewed 42 times

When using libxml2 in python to parse HTML via htmlParseDoc it becomes transformed. For example,

Orig HTML: Order Number & OrderID & Was Approved

Becomes: Order Number & OrderID & Was Approved

Additional non-visible control characters also are transformed, thereby replacing "&" with the original characters do not make the before and after strings equivalent. (I checked this by dumping the strings in hex format.)

Anyone know how to either 1) prevent the transformation from occurring or 2) create a transformation in the other direction to retrieve the original.

Thanks in advance.

edited Aug 01 '16 at 14:10

DaBler

2,695
2
26
46

asked Jan 16 '15 at 15:20

AnonPyDev

You could use [htmlParser](http://stackoverflow.com/questions/2087370/decode-html-entities-in-python-string) – fredtantini Jan 16 '15 at 15:25
Thank you. htmlParser with decode and then lowercasing the string works for two of my cases. I will post updates with the remaining issues. – AnonPyDev Jan 16 '15 at 15:57
One should note that `A & B` isn't valid HTML in the first place. Besides that have a look [here](http://stackoverflow.com/questions/17423495/how-to-solve-ampersand-conversion-issue-in-xml). – dhke Aug 01 '16 at 14:18

Prevent python libxml2 transformation

0 Answers0