3

Here is what I tried:

I found this doc : How to guess the encoding of a document?

So I tried a piece of the code:

def check_encoding(self, filename):
        data = open(self.input_path_value.get() + "/" + filename, 'r')
        if data.startswith(codecs.BOM_UTF16_LE):
            return True
        else:
            return False

But it doesn't understand the function startswith(). I just need to check the document first characters (where the BOM is located). And my files can have a size of 9Go so I can't put the text in RAM.

I also tried to do something like:

try:
    data = open(self.input_path_value.get() + "/" + filename, 'r', encoding = 'utf-16-le')
    return True
except:
    return False

But it doesn't really check if there's a BOM and sometimes it works but it's not really utf16 encoded.

Any ideas how to check this simply?

Community
  • 1
  • 1
Baptiste Arnaud
  • 2,522
  • 3
  • 25
  • 55

0 Answers0