7

I am trying to read one log file from python script. My program works fine in Linux but I am getting error in windows.After reading some line at particular line number I am getting following error

  File "C:\Python\lib\encodings\cp1252.py", line 23, in decode
    return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x8f in position 311: char
acter maps to <undefined> 

following is code I am using to read file

with open(log_file, 'r') as log_file_fh:
    for line in log_file_fh:
        print(line)

I have tried to fix it by using different encoding modes as ascii,utf8,utf-8,ISO-8859-1,cp1252,cp850. But still facing same issue. Is there any way to fix this issue.

PM 2Ring
  • 54,345
  • 6
  • 82
  • 182
Ragini Dahihande
  • 636
  • 2
  • 9
  • 17
  • What _is_ the encoding of the file? – RemcoGerlich Feb 03 '17 at 07:41
  • I dont know encoding of file. But I think its ANSI I have seen one way to find out encoding of file is to open file in notepad and use save as there i am seeing ANSI. – Ragini Dahihande Feb 03 '17 at 07:43
  • I used follwing link to find out encoding its showing encoding as western http://codeftw.blogspot.in/2009/07/how-to-find-character-encoding-of-text.html – Ragini Dahihande Feb 03 '17 at 07:48
  • 2
    The so-called ANSI encoding is [Windows-1252](https://en.wikipedia.org/wiki/Windows-1252) aka CP-1252. That error messages says that your Windows system uses CP-1252 as the default encoding, but the file you're reading is _not_ CP-1252, so it fails to decode it to Unicode. You need to specify the actual encoding of the file. In Python 3, the easy way to do that is to pass the encoding as an argument in the `open` call. Try `with open(log_file, 'r', encoding="utf-8") as log_file_fh:`, and let us know what happens. – PM 2Ring Feb 03 '17 at 08:19
  • 1
    BTW, with Unicode questions you should _always_ mention which Python version you're using (preferably by including the appropriate tag) because Python 3 handles Unicode quite differently to Python 2. – PM 2Ring Feb 03 '17 at 08:53
  • I currently using python 3 with open(log_file, 'r', encoding="utf-8") as log_file_fh: this code is not working getting Error for this also – Ragini Dahihande Feb 03 '17 at 08:57
  • You may find this article helpful: [Pragmatic Unicode](http://nedbatchelder.com/text/unipain.html), which was written by SO veteran Ned Batchelder. – PM 2Ring Feb 03 '17 at 11:18

2 Answers2

14

The log file which I want to read through python script is encoded in western language. I have refereed following link https://docs.python.org/2.4/lib/standard-encodings.html I used 'cp850' as encoding mode and this worked for me

with open(log_file, 'r',encoding='cp850') as log_file_fh:
    for line in log_file_fh:
        print(line)

But for Western Europe lots of codec are available on that site. I think this is not correct solution because most of the developers are suggesting not use to 'cp850' mode

The best way to handle encoding error is add errors argument while opening the file and give 'ignore' as property.It will ignore that special character we are not able to decode.In my case this option is OK because i don't want to read entire content of file.I just want some specific log.

with open(log_file, 'r',errors='ignore') as log_file_fh:
    for line in log_file_fh:
        print(line)
Ragini Dahihande
  • 636
  • 2
  • 9
  • 17
  • 1
    Well, if the file decodes correctly with `'cp850'` then you need to specify `'cp850'` as the encoding. However, it would be much better to fix the code that creates the log file in the first place, so that it is encoded with `'UTF-8'` instead of the ancient `'cp850'`. If you need help with that please **do not** modify this question. Instead, you need to create a new question, with a [mcve] that shows us how you are writing the log file. – PM 2Ring Feb 03 '17 at 09:52
  • My python script work fine in linux and windows10 but in windows 7 getting charmap error that i resolved by 'cp850'.I refereed this one http://stackoverflow.com/questions/9233027/unicodedecodeerror-charmap-codec-cant-decode-byte-x-in-position-y-character.I used 'latin_1' that one also worked for me – Ragini Dahihande Feb 03 '17 at 10:57
  • That sounds like the Windows systems have different default charmaps set. But that's really a Windows configuration problem, not a Python coding problem. Both cp850 and cp1252 are related to Latin1 (aka ISO-8859-1), but some of the characters in those code pages are different to Latin1; please see the Wikipedia articles for details. Because of those similarities, if (for example) you try to decode a cp850 file with Latin1 it may appear to work most of the time, but some of the special characters may be wrong. – PM 2Ring Feb 03 '17 at 11:14
  • As I said earlier, the best solution to this mess is to just use UTF-8 instead. However, that may be tricky to do correctly on Windows machines. – PM 2Ring Feb 03 '17 at 11:15
  • 1
    I had a similar issue but mine was flipped, as my system was trying to read an utf-8 file with a different encoding, and specifying the encoding as "utf-8" fixed it. +1 – RaKXeR Jan 17 '19 at 22:17
-2

EDIT: open your file in binary mode as suggested: with open(log_file, 'rb')

then in your code decode utf-8:

with open(log_file, 'r') as log_file_fh:
    for line in log_file_fh:
        line = line.decode('utf-8')
        print(line)
Roy Holzem
  • 860
  • 13
  • 25
  • @raginidahihande This solution assumes that `log_file` uses the `utf-8` encoding, but for it to work correctly you _need_ to open the file in binary mode: `with open(log_file, 'rb')`. And of course even if you do that it _won't_ work correctly if `log_file` isn't encoded with `utf-8`. – PM 2Ring Feb 03 '17 at 08:30
  • @PM2Ring can you make new answer then i delete mine – Roy Holzem Feb 03 '17 at 08:45
  • Just fix your answer to open the file in binary mode, and mention that it will only work if the file is UTF-8. I haven't written an answer because Ragini still hasn't told us if my suggestion in the question comment worked, so we don't know what the actual encoding is. But I agree that UTF-8 is a reasonable guess. :) – PM 2Ring Feb 03 '17 at 08:50
  • i dont want to read file in binary mode I want to read it in text mode – Ragini Dahihande Feb 03 '17 at 08:58
  • @raginidahihande: You have two options. 1) Open the file in binary mode and then decode it to text using the correct encoding, as shown in this answer. 2) Open it in text mode, specifying the correct encoding in the open call, as shown in my comment. Either approach will give the same result. – PM 2Ring Feb 03 '17 at 09:48