I have a script that takes XML as a string and attempts to parse it using xml
Here is an example of the code I am working with
from xml.etree.ElementTree import fromstring
my_xml = """
<documents>
<record>Hello< &O >World</record>
</documents>
"""
xml = fromstring(my_xml)
When I run the code, I get a ParseError
Traceback (most recent call last):
File "C:/Code/Python/xml_convert.py", line 7, in <module>
xml = fromstring(my_xml)
File "C:\Python27\lib\xml\etree\ElementTree.py", line 1300, in XML
parser.feed(text)
File "C:\Python27\lib\xml\etree\ElementTree.py", line 1642, in feed
self._raiseerror(v)
File "C:\Python27\lib\xml\etree\ElementTree.py", line 1506, in _raiseerror
raise err
xml.etree.ElementTree.ParseError: not well-formed (invalid token): line 3, column 18
As stated in Invalid Characters in XML, it is due to having the HTML entities <
, >
, and &
How may I go about handling these entities so the XML reads them as plain text?