0

I am reading from xml files into Python with the code:

import xml.etree.ElementTree as ET
tree = ET.parse(file_name)

For some reason the source i am reading from appears to have the incorrect encoding specified in the file (it is correct for 10 years of the data that I am reading from, and then suddenly i get problems for subsequent files).

Specifically i get the following error raised:

xml.etree.ElementTree.ParseError: encoding specified in XML declaration is incorrect: line 1, column 30

I think the data is encoding in UTF-8, however the encoding specified in the file is UTF-16 [the first line of the file is <?xml version='1.0' encoding='UTF-16'?>] - when i manually change the file text to say UTF-8 i do not get an error raised, and as far as i can tell, it appears to be reading everything correctly.

How do you override the xml reader so that it treats the encoding as UTF-8, and ignores what is specified within the file?

kyrenia
  • 5,431
  • 9
  • 63
  • 93
  • Open the file manually, specify the encoding and pass the string to fromstring. You can try chardet to find out the actual encoding https://pypi.python.org/pypi/chardet – Padraic Cunningham Apr 05 '16 at 18:33
  • Isn't this thread of any help? http://stackoverflow.com/questions/25796238/reading-xml-header-encoding – zezollo Apr 05 '16 at 18:55

0 Answers0