When using libxml2 in python to parse HTML via htmlParseDoc it becomes transformed. For example,
Orig HTML:
Order Number & OrderID & Was Approved
Becomes:
Order Number & OrderID & Was Approved
Additional non-visible control characters also are transformed, thereby replacing "&"
with the original characters do not make the before and after strings equivalent. (I checked this by dumping the strings in hex format.)
Anyone know how to either 1) prevent the transformation from occurring or 2) create a transformation in the other direction to retrieve the original.
Thanks in advance.