I have two XML-files containing a "ß" ("scharfes S" in german), starting with:
<?xml version="1.0" encoding="utf-16" standalone="yes"?>
and
I used the following code to read the utf-8 file:
with open(file.xml, encoding='utf-8') as file:
f = file.read()
xml = xmltodict.parse(f)
and this code for the utf-16 file.
with open(file.xml, encoding='utf-16') as file:
f = file.read()
xml = xmltodict.parse(f)
for the UTF-16 file I get this error: UnicodeError: UTF-16 stream does not start with BOM
.
Changing everything to:
with open(file.xml, encoding='utf-16') as file:
file.seek(1, os.SEEK_SET)
f = file.read()
xml = xmltodict.parse(f)
where I tried different points (e.g. seek(1,..), seek(2,..), ... ) doesn't help.
Then I checked the encoding with (Source)
alias vic="vim -c 'execute \"silent \!echo \" . &fileencoding | q'"
vic file.xml
> latin-1
(Therefore I replaced encoding='utf-16'
to encoding='latin-1'
).
But now I get errors about the "ß" in the code (e.g. when trying "utf-16-le")
"'utf-16-le' codec can't decode bytes in position 12734-12735: illegal encoding"
Does someone know where the problem is here? Or in general: How can I read XML files in Python with utf-8 or utf-16 encoding without having BOM errors or errors about the character "ß".
Thank you in advance!