I have the task to check the encoding of a file. Actually, my problem is the encoding formats which python provides in its encoding
function. I´m very new to python so I think that I overlook something.
I can´t understand the following points:
When I´m encoding a file which has the
utf-8
BOM format then the encoding function tells me that it isutf-8
.When I´m checking the iso8859_6 format then it tells me that he couldn´t recognize this format even though the file has the iso8859_6 format but in case I check "cp720" then it´s able to recognize it
According to this documentation, it should be able to recognize the iso8859_6
format
I´ve tried to find something understandable in the www but couldn´t find something.
import codecs
import io
class Format:
def __init__(self, file_Name):
self.file_Name = file_Name
def check_coding(self):
encoding_formats = ['iso8859_6','utf-8', 'utf-8-sig', 'ascii']
for ex in encoding_formats:
try:
fh = codecs.open(self.file_Name, 'r', encoding=ex)
fh.readlines()
fh.close()
except UnicodeDecodeError:
print('Die angelieferte Datei ist nicht nach %s kodiert' % ex)
response = False;
else:
print('Angelieferte Datei besitzt folgende Kodierung: %s ' % ex)
response = True;
break
return response
file_Name
format is utf-8
BOM so it shouldn´t tell me it´s utf-8
.
if the file_Names
format is iso8859_6
it tells me that it´s not coded in this format even though it is.