6

Possible Duplicate:
Python, Unicode, and the Windows console

I read some strings from file and when I try to print these utf-8 strings in windows console, I get error

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc4 in position 0: ordinal not in range(128)

I've tried to set console-encoding to utf-8 with "chcp 65001" But than I than get this error message

LookupError: unknown encoding: cp65001
Community
  • 1
  • 1
Meloun
  • 13,601
  • 17
  • 64
  • 93
  • 6
    Fixed in Python 3.3. – Mark Ransom Apr 25 '12 at 18:31
  • 1
    is there some workaround for python 2.7? – Meloun Apr 25 '12 at 18:37
  • This question has come up a few times. Here's one example with a workaround that may or may not work: http://stackoverflow.com/questions/5419/python-unicode-and-the-windows-console – Mark Ransom Apr 25 '12 at 18:42
  • 1
    Check this out: http://stackoverflow.com/questions/878972/windows-cmd-encoding-change-causes-python-crash/3259271 – marbdq Apr 26 '12 at 10:55
  • 1
    Daira Hopwood's answer on 878972 is the answer. On 2.7 it's *wontfix* (because it seems an even bigger PITA to backport it than to backport it to 3.1), in 3.3 it's *added* but it is still buggy, even in *3.5*, due to Microsoft strangeness, and it depends on the current font, not just `chcp 65001`. BTW. the relevant environment variable is setting is `SET PYTHONIOENCODING=utf-8` (you can try `mbcs`, too), but neither will work, because cp65001 is buggy, and the winapi is buggy (I am not sure what 'mbcs' supposed to do but it won't help). – n611x007 Sep 08 '15 at 10:07

2 Answers2

3

I recommend you to check similar questions on stackoverflow, there are many of them.

Anyway, you can do it this way:

  1. read from file in any encoding (for example utf8) but decode strings to unicode
  2. for windows console, output unicode strings. You don't need to encode in this special case. You don't need to set the console encoding, output text will be correctly encoded automatically.

For files, you need to use codecs module or to encode in proper encoding.

Jiri
  • 16,425
  • 6
  • 52
  • 68
  • Good advice, but it should be noted that if you're expecting multiple language support on the console this won't provide it. – Mark Ransom Apr 30 '12 at 14:42
  • did this actually work for you? I get `LookupError: unknown encoding: cp65001` even before I read the first byte from the file. It seems totally unrelated to string contents. It is as if Python would lack the understanding of `cp65001` but try that way nevertheless, and this will never work unless you work around it or use python 3.3, if I had to guess. – n611x007 Sep 08 '15 at 08:55
  • 1
    @naxa Yeah, python does not understand cp65001. Do not chcp to 65001. Or at least do use `set PYTHONIOENCODING=utf-8` before calling python. See also https://stackoverflow.com/questions/878972/windows-cmd-encoding-change-causes-python-crash – Jiri Sep 08 '15 at 10:11
  • thanks, I found it meanwhile, the basic idea is to use python 3.3+, and expect it to still be buggy, and on 2.7 it's wontfix, and set your font to "DejaVu Sans Mono" or equivalent in the console. :) – n611x007 Sep 08 '15 at 10:12
2

The print command tries to convert Unicode strings to the encoding supported by the console. Try:

>>> import sys
>>> sys.stdout.encoding
'cp852'

It shows you what encoding the console supports (what is told to Python to be supported). If the character cannot be converted to that encoding, there is no way to display it correctly.

pepr
  • 20,112
  • 15
  • 76
  • 139