0

Ok, i want to print a string in my windows xp console. There are several characters the console cant print, so i have to encode to my stdout.encoding which is 'cp437'. but printing the encoded string, the 'ß' is printed as '\xe1'. after decoding back to unicode and printing the string, i get the output i want. but this feels somewhat wrong. how is the correct way to print a string and get ? for non-printable characters?

>>>var
'Bla \u2013 großes'
>>>print(var)
UnicodeEncodeError: 'charmap' codec can't encode character '\u2013'

>>>var.encode('cp437', 'replace')
b'Bla ? gro\xe1es'
>>>print(var.encode('cp437', 'replace'))
b'Bla ? gro\xe1es'

>>>var.encode('cp437', 'replace').decode('cp437')
'Bla ? großes'
>>>print(var.encode('cp437', 'replace').decode('cp437'))
Bla ? großes

edit: @Mark Ransom: since i print a lot this makes the code pretty bloated i feel :/

@eryksun: excactly what i was looking for. thanks a lot!

johnson
  • 3,729
  • 3
  • 31
  • 32
  • You can set `sys.stdout = io.TextIOWrapper(sys.stdout.detach(), sys.stdout.encoding, 'replace')` to keep the same encoding, but with replacement. – Eryk Sun Feb 15 '15 at 01:12
  • I don't see anything wrong with the last line you have, it's exactly the answer I was going to write. – Mark Ransom Feb 15 '15 at 04:57
  • Save yourself a lot of trouble and just get a Python IDE that supports UTF-8. The console is pretty useless for anything but its default locale characters. – Mark Tolonen Feb 15 '15 at 06:42

2 Answers2

3

To print Unicode characters that can't be represented using the console codepage, you could use win-unicode-console Python package that uses Unicode API such as ReadConsoleW/WriteConsoleW() to read/write Unicode from/to Windows console directly:

#!/usr/bin/env python3
import win_unicode_console

win_unicode_console.enable()
try:
    print('Bla \u2013 großes')
finally:
    win_unicode_console.disable()

save it to test_unicode.py file, and run it:

C:\> py test_unicode.py

You should see:

Bla – großes

As a preferred alternative, you could use run module (included in the package), to run an ordinary script with enabled Unicode support in Windows console:

C:\> py -m run unmodified_script_that_prints_unicode.py

To install win_unicode_console module, run:

C:\> pip install win-unicode-console

Make sure to select a font able to display Unicode characters in Windows console.


To save the output of a Python script to a file, you could use PYTHONIOENCODING envvar:

C:\> set PYTHONIOENCODING=utf-8:backslashreplace
C:\> py unmodified_script_that_prints_unicode.py >output_utf8.txt

Do not hardcode the character encoding of your environment inside your script, print Unicode instead. The examples show that the same script may be used to print to the console and to a file using different encodings and different methods.

jfs
  • 399,953
  • 195
  • 994
  • 1,670
3

An alternate solution is to not use the crippled Windows console for general unicode output. Tk text widgets (accessed as tkinter Text instances) handle all BMP chars as long as the selected font will.

Since Idle used tkinter, it can as well. Running an Idle editor file (call it tem.py) containing

print('Bla \u2013 großes')

prints the following in the Shell window.

Bla – großes

A file can be run through Idle from the console with -m and -r.

C:\>python -m idlelib -r c:/programs/python34/tem.py

This opens a shell window and prints the same as above. Or you can create your own tk window with Label or Text widget.

Terry Jan Reedy
  • 18,414
  • 3
  • 40
  • 52