I need to check if my file starts with a BOM and is encoded in utf-16-le

Asked Mar 12 '18 at 09:46

Active Mar 12 '18 at 09:46

Viewed 1,788 times

Here is what I tried:

I found this doc : How to guess the encoding of a document?

So I tried a piece of the code:

def check_encoding(self, filename):
        data = open(self.input_path_value.get() + "/" + filename, 'r')
        if data.startswith(codecs.BOM_UTF16_LE):
            return True
        else:
            return False

But it doesn't understand the function startswith(). I just need to check the document first characters (where the BOM is located). And my files can have a size of 9Go so I can't put the text in RAM.

I also tried to do something like:

try:
    data = open(self.input_path_value.get() + "/" + filename, 'r', encoding = 'utf-16-le')
    return True
except:
    return False

But it doesn't really check if there's a BOM and sometimes it works but it's not really utf16 encoded.

Any ideas how to check this simply?

edited Jun 20 '20 at 09:12

Community

asked Mar 12 '18 at 09:46

Baptiste Arnaud

2,522
3
25
55

https://stackoverflow.com/questions/13590749/reading-unicode-file-data-with-bom-chars-in-python startswith seems to be ok for that – Daniel E. Mar 12 '18 at 09:51
You didn't read the data from the file. What you call `data` is a file object, not the content of the file and a file object doesn't have a `startswith` method. – Matthias Mar 12 '18 at 09:54
Indeed @Matthias it seems to work! – Baptiste Arnaud Mar 12 '18 at 09:56

I need to check if my file starts with a BOM and is encoded in utf-16-le

0 Answers0