Here is what I tried:
I found this doc : How to guess the encoding of a document?
So I tried a piece of the code:
def check_encoding(self, filename):
data = open(self.input_path_value.get() + "/" + filename, 'r')
if data.startswith(codecs.BOM_UTF16_LE):
return True
else:
return False
But it doesn't understand the function startswith()
. I just need to check the document first characters (where the BOM is located). And my files can have a size of 9Go so I can't put the text in RAM.
I also tried to do something like:
try:
data = open(self.input_path_value.get() + "/" + filename, 'r', encoding = 'utf-16-le')
return True
except:
return False
But it doesn't really check if there's a BOM and sometimes it works but it's not really utf16 encoded.
Any ideas how to check this simply?