7

I have scripts which print out messages by the logging system or sometimes print commands. On the Windows console I get error messages like

Traceback (most recent call last):
  File "C:\Python32\lib\logging\__init__.py", line 939, in emit
    stream.write(msg)
  File "C:\Python32\lib\encodings\cp850.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_map)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u2019' in position 4537:character maps to <undefined>

Is there a general way to make all encodings in the logging system, print commands, etc. fail-safe (ignore errors)?

Gere
  • 12,075
  • 18
  • 62
  • 94

1 Answers1

9

The problem is that your terminal/shell (cmd as your are on Windows) cannot print every Unicode character.

You can fail-safe encode your strings with the errors argument of the str.encode method. For example you can replace not supported chars with ? by setting errors='replace'.

>>> s = u'\u2019'
>>> print s
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Python27\lib\encodings\cp850.py", line 12, in encode
    return codecs.charmap_encode(input,errors,encoding_map)
UnicodeEncodeError: 'charmap' codec can\'t encode character u'\u2019' in position
 0: character maps to <undefined>
>>> print s.encode('cp850', errors='replace')
?

See the documentation for other options.

Edit If you want a general solution for the logging, you can subclass StreamHandler:

class CustomStreamHandler(logging.StreamHandler):

    def emit(self, record):
        record = record.encode('cp850', errors='replace')
        logging.StreamHandler.emit(self, record)
schlamar
  • 9,238
  • 3
  • 38
  • 76
  • But if I pre-encode all strings they change type (to bytes) which might change their behaviour in the interior? Also it's in the built-in codec library. I cannot change that. Can I set an option in codec? – Gere Jun 15 '12 at 12:35
  • Edited my answer with a general logging solution. – schlamar Jun 15 '12 at 12:53
  • And is there a general solution so that I don't have to change code at different places (substitute handlers)? Maybe some global option for encoding errors? – Gere Jun 16 '12 at 07:03
  • If you don't use multiple loggers (by using `getLogger`) you have to set the handler once. If you use multiple handlers, you can use `setLoggerClass` with a custom class which is using the handler. – schlamar Jun 16 '12 at 10:54
  • This [answer](http://stackoverflow.com/a/17337953/307454) seems to get the job done, quite effectively. – lifebalance Jan 21 '14 at 15:30