This question appears related to this one from 2013, but it didn't help me.
I'm about to parse a large (2GB) XML file, and plan to do it with Python 3.5.2 and ElementTree. I'm new to Python, but it works well until reaching any escape character, such as:
<author>Sanjeev Saxöna</author>
returning:
test.xml
File "<string>", line unknown
ParseError: undefined entity ö: line 5, column 19enter code here
My code looks something like this:
import xml.etree.ElementTree as etree
for event, elem in etree.iterparse('test_esc.xml'):
# do something with the node
What's the best way to deal with this? Parsing the unescaped 'ö' actually works fine:
<author>Sanjeev Saxöna</author>
Is there an easy way to programmatically unescape the whole XML file?