0

So I have a bytes object but not sure of its encoding, but know it is not utf-8:

a.decode('utf-8')

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x9a in position 0: invalid start byte

What I would like to do is something like:

for encoding in encodings:
    try:
        a.decode(encoding)
        print("This is it!", encoding)
    except Exception:
        pass

How do you get Python to give you everything that will go into .decode as a list encodings so I can plug it in there?

cardamom
  • 6,873
  • 11
  • 48
  • 102
  • https://github.com/tripleee/8bit has code which does this. I recall seeing this kind of question several times before but cannot immediately find a good duplicate. – tripleee Feb 13 '19 at 10:06
  • BTW you do not exit for then you find the encoding. But no, many encoding could successfully decode a byte arrays. Most 1 byte encoding have no structure, so every one could decode your string. You need to be smarter, And possibly not reinventing the wheel. – Giacomo Catenazzi Feb 13 '19 at 15:06

1 Answers1

2

You can get them like this:

import encodings
all_of_encodings = encodings.aliases.aliases.keys()

for encoding in all_of_encodings:
    # do what you want
Mehrdad Pedramfar
  • 10,941
  • 4
  • 38
  • 59
  • 2
    As outlined in the proposed duplicate, this is actually not a good solution. – tripleee Feb 13 '19 at 10:08
  • This does not cover the entire [list of all aliases and standard encodings](https://docs.python.org/3.7/library/codecs.html#standard-encodings). I ran the code and it didn't match. – ingyhere Jan 28 '20 at 05:56