34

This is one of my own projects. This will later help benefit other people in a game I am playing (AssaultCube). Its purpose is to break down the log file and make it easier for users to read.

I kept getting this issue. Anyone know how to fix this? Currently, I am not planning to write/create the file. I just want this error to be fixed.

The line that triggered the error is a blank line (it stopped on line 66346).

This is what the relevant part of my script looks like:

log  =  open('/Users/Owner/Desktop/Exodus Logs/DIRTYLOGS/serverlog_20130430_00.15.21.txt', 'r')
for line in log:

and the exception is:

Traceback (most recent call last):
  File "C:\Users\Owner\Desktop\Exodus Logs\Log File Translater.py", line 159, in <module>
    main()
 File "C:\Users\Owner\Desktop\Exodus Logs\Log File Translater.py", line 7, in main
    for line in log:
  File "C:\Python32\lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 3074: character maps to <undefined>
ASGM
  • 11,051
  • 1
  • 32
  • 53
Bugboy1028
  • 417
  • 1
  • 4
  • 11
  • What encoding is the file in? – Martijn Pieters May 13 '13 at 18:14
  • Strangly, This seems to only occured if I used specific file. It stopped at spefiic line as well. – Bugboy1028 May 13 '13 at 18:14
  • @martijnPieters, Where can I find encoding in the file? – Bugboy1028 May 13 '13 at 18:15
  • 3
    Your windows default encoding is `cp1252` but the file is not using that encoding. – Martijn Pieters May 13 '13 at 18:15
  • 3
    @Bugboy1028 By definition, you cannot find an encoding in the *decoded* file itself. You always have to remember it alongside the file, or devise a detection scheme for your file format. – phihag May 13 '13 at 18:17
  • 2
    you can try [chardet](https://pypi.python.org/pypi/chardet) to guess the encoding – Thomas Fenzl May 13 '13 at 18:20
  • Very often, an 0x81 byte means you've got UTF-8 (or data that's been corrupted by an incorrect conversion between cp1252 and UTF-8). For example, see [here](http://www.i18nqa.com/debug/bug-double-conversion.html). And UTF-8 is very common nowadays, especially in cross-platform logfiles. But this is by no means guaranteed, and it doesn't really help you with the real problem, which is that your program has to know what charset to use, and you need to figure out how to tell it. If you can find out that the logs are always UTF-8, great; if not, knowing that this one might be doesn't help much… – abarnert May 13 '13 at 18:24
  • [Pragmatic Unicode _or_ How Do I Stop the Pain?](http://nedbatchelder.com/text/unipain.html) – Robᵩ May 13 '13 at 18:39

1 Answers1

58

Try:

enc = 'utf-8'
log = open('/Users/Owner/Desktop/Exodus Logs/DIRTYLOGS/serverlog_20130430_00.15.21.txt', 'r', encoding=enc)

if it won't work try:

enc = 'utf-16'
log = open('/Users/Owner/Desktop/Exodus Logs/DIRTYLOGS/serverlog_20130430_00.15.21.txt', 'r', encoding=enc)

you could also try it with

enc = 'iso-8859-15'

also try:

enc = 'cp437'

wich is very old but it also has the "ü" at 0x81 wich would fit to the string "üßer" wich I found on the homepage of assault cube.

If all the codings are wrong try to contact some of the guys developing assault cube or as mentioned in a comment: have a look at https://pypi.python.org/pypi/chardet

Robsdedude
  • 1,292
  • 16
  • 25