0

I am trying to read an XML.dump file using 'xmltodict' library with python 3 to make a dictionary from this file. The code I used is like:

import xmltodict

with open('file1.xml.dump') as fd:
    content = fd.read()
    doc = xmltodict.parse(content)

The error that I got is: UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 0: invalid start byte

Does anyone know what this error is about of this error and how to fix this problem?

I also added encoding='UTF-8' in the with open statement, and I get the same error.

Sara
  • 419
  • 1
  • 6
  • 14
  • please give the utf encoding while opening the file\ – Himanshu Poddar Jul 05 '22 at 15:37
  • @HimanshuPoddar `open` used `UTF-8` by default, that's what caused the error. The file isn't UTF-8 – Panagiotis Kanavos Jul 05 '22 at 15:39
  • I wasn't talking about utf-8 encoding – Himanshu Poddar Jul 05 '22 at 15:40
  • What were you talking about? Why do you assume the file uses UTF16 or UTF32 instead of eg Latin1? – Panagiotis Kanavos Jul 05 '22 at 15:41
  • Does this answer your question? [Python: UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 0: invalid start byte](https://stackoverflow.com/questions/62170614/python-unicodedecodeerror-utf-8-codec-cant-decode-byte-0x80-in-position-0) – Ulrich Eckhardt Jul 05 '22 at 15:45
  • Sarah, btw, this has nothing to do with XML, since that code is never called. See [mcve], too. Further, I found this answer by just searching for the error message online. – Ulrich Eckhardt Jul 05 '22 at 15:46
  • @ Ulrich Eckhardt, thanks for trying to help me. But I saw this answer and it did not work for me. I think there is something related to xml.dump file that I have, but I am not sure and I do not know what should I do to fix it. – Sara Jul 05 '22 at 15:50
  • @ Panagiotis Kanavos, I've noticed that this file is not UTF-8 but do you know what should I use instead? – Sara Jul 05 '22 at 15:53
  • 1
    @Sarah Ask whoever created the file. – Sören Jul 05 '22 at 15:58
  • 1
    We can't know what encoding was used for that file and why it wasn't UTF8, the de-facto standard, *especially* for XML files. Are you sure it's even XML? XML documents start with `<`. Either the file isn't XML at all or contains extra text before the XML part – Panagiotis Kanavos Jul 05 '22 at 15:59
  • What is `XML.dump` anyway? That's not a common file format or even name, much less an XML file format. How was this file generated? Can you open it with a text editor? – Panagiotis Kanavos Jul 05 '22 at 16:02
  • @ Sören You think with adding the correct encoding, this problem will fix? – Sara Jul 05 '22 at 16:02
  • @ Panagiotis Kanavos, I am sure that it was an XML file then someone did something to this file( I think like parse or whatever), then I have the file in this format: name.xml.dump. I tried to open it to see what is inside this file but apparently, I can not. The computer just crashed. I think because it is too large. – Sara Jul 05 '22 at 16:06

1 Answers1

1

I just stepped into this error.

In my case it was caused by a float assigned to the #text node.

      'field': {
        '@attribute': 'm3',
        '#text': 10.076
      }

The assign it's valid but raises the encoding error.

The most easy fix it's to assign the value in an f string like this:

'field': {
        '@attribute': 'm3',
        '#text': f'{10.076}'
      }

So I would suggest you to review your dictionary and verify that all the root fields are indeed strings.