xmltodict 'UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 0: invalid start byte'

Question

I am trying to read an XML.dump file using 'xmltodict' library with python 3 to make a dictionary from this file. The code I used is like:

import xmltodict

with open('file1.xml.dump') as fd:
    content = fd.read()
    doc = xmltodict.parse(content)

The error that I got is: UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 0: invalid start byte

Does anyone know what this error is about of this error and how to fix this problem?

I also added encoding='UTF-8' in the with open statement, and I get the same error.

@HimanshuPoddar `open` used `UTF-8` by default, that's what caused the error. The file isn't UTF-8 — Panagiotis Kanavos, Jul 05 '22 at 15:39
What were you talking about? Why do you assume the file uses UTF16 or UTF32 instead of eg Latin1? — Panagiotis Kanavos, Jul 05 '22 at 15:41
Does this answer your question? [Python: UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 0: invalid start byte](https://stackoverflow.com/questions/62170614/python-unicodedecodeerror-utf-8-codec-cant-decode-byte-0x80-in-position-0) — Ulrich Eckhardt, Jul 05 '22 at 15:45
Sarah, btw, this has nothing to do with XML, since that code is never called. See [mcve], too. Further, I found this answer by just searching for the error message online. — Ulrich Eckhardt, Jul 05 '22 at 15:46
@ Ulrich Eckhardt, thanks for trying to help me. But I saw this answer and it did not work for me. I think there is something related to xml.dump file that I have, but I am not sure and I do not know what should I do to fix it. — Sara, Jul 05 '22 at 15:50
@ Panagiotis Kanavos, I've noticed that this file is not UTF-8 but do you know what should I use instead? — Sara, Jul 05 '22 at 15:53
We can't know what encoding was used for that file and why it wasn't UTF8, the de-facto standard, *especially* for XML files. Are you sure it's even XML? XML documents start with `<`. Either the file isn't XML at all or contains extra text before the XML part — Panagiotis Kanavos, Jul 05 '22 at 15:59
What is `XML.dump` anyway? That's not a common file format or even name, much less an XML file format. How was this file generated? Can you open it with a text editor? — Panagiotis Kanavos, Jul 05 '22 at 16:02
@ Sören You think with adding the correct encoding, this problem will fix? — Sara, Jul 05 '22 at 16:02
@ Panagiotis Kanavos, I am sure that it was an XML file then someone did something to this file( I think like parse or whatever), then I have the file in this format: name.xml.dump. I tried to open it to see what is inside this file but apparently, I can not. The computer just crashed. I think because it is too large. — Sara, Jul 05 '22 at 16:06

score 1 · Answer 1 · answered Feb 07 '23 at 12:13

I just stepped into this error.

In my case it was caused by a float assigned to the #text node.

      'field': {
        '@attribute': 'm3',
        '#text': 10.076
      }

The assign it's valid but raises the encoding error.

The most easy fix it's to assign the value in an f string like this:

'field': {
        '@attribute': 'm3',
        '#text': f'{10.076}'
      }

So I would suggest you to review your dictionary and verify that all the root fields are indeed strings.

xmltodict 'UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 0: invalid start byte'

1 Answers1