I am reading from xml files into Python with the code:
import xml.etree.ElementTree as ET
tree = ET.parse(file_name)
For some reason the source i am reading from appears to have the incorrect encoding specified in the file (it is correct for 10 years of the data that I am reading from, and then suddenly i get problems for subsequent files).
Specifically i get the following error raised:
xml.etree.ElementTree.ParseError: encoding specified in XML declaration is incorrect: line 1, column 30
I think the data is encoding in UTF-8
, however the encoding specified in the file is UTF-16
[the first line of the file is <?xml version='1.0' encoding='UTF-16'?>
] - when i manually change the file text to say UTF-8
i do not get an error raised, and as far as i can tell, it appears to be reading everything correctly.
How do you override the xml reader so that it treats the encoding as UTF-8
, and ignores what is specified within the file?