Override encoding of xml reader in Python

Question

I am reading from xml files into Python with the code:

import xml.etree.ElementTree as ET
tree = ET.parse(file_name)

For some reason the source i am reading from appears to have the incorrect encoding specified in the file (it is correct for 10 years of the data that I am reading from, and then suddenly i get problems for subsequent files).

Specifically i get the following error raised:

xml.etree.ElementTree.ParseError: encoding specified in XML declaration is incorrect: line 1, column 30

I think the data is encoding in UTF-8, however the encoding specified in the file is UTF-16 [the first line of the file is <?xml version='1.0' encoding='UTF-16'?>] - when i manually change the file text to say UTF-8 i do not get an error raised, and as far as i can tell, it appears to be reading everything correctly.

How do you override the xml reader so that it treats the encoding as UTF-8, and ignores what is specified within the file?

Open the file manually, specify the encoding and pass the string to fromstring. You can try chardet to find out the actual encoding https://pypi.python.org/pypi/chardet — Padraic Cunningham, Apr 05 '16 at 18:33
Isn't this thread of any help? http://stackoverflow.com/questions/25796238/reading-xml-header-encoding — zezollo, Apr 05 '16 at 18:55

Override encoding of xml reader in Python

0 Answers0