18

I'm working in WinXP 5.1.2600, writing a Python application involving Chinese pinyin, which has involved me in endless Unicode problems. Switching to Python 3.0 has solved many of them. But the print() function for console output is not Unicode-aware for some odd reason. Here's a teeny program.

#!/usr/bin/env python
# -*- coding: utf-8 -*-
    
import sys

print('sys.stdout encoding is "' + sys.stdout.encoding + '"')
str1 = 'lüelā'
print(str1)

Output is (changing angle brackets to square brackets for readability):

    sys.stdout encoding is "cp1252"
    Traceback (most recent call last):
      File "TestPrintEncoding.py", line 22, in [module]
        print(str1)
      File "C:\Python30\lib\io.py", line 1491, in write
        b = encoder.encode(s)
      File "C:\Python30\lib\encodings\cp1252.py", line 19, in encode
        return codecs.charmap_encode(input,self.errors,encoding_table)[0]
    UnicodeEncodeError: 'charmap' codec can't encode character '\u0101' 
    in position 4: character maps to [undefined]

Note that ü = '\xfc' = 252 gives no problem since it's upper ASCII. But ā = '\u0101' is beyond 8 bits.

Anyone have an idea how to change the encoding of sys.stdout to 'utf-8'? Bear in mind that Python 3.0 no longer uses the codecs module, if I understand the documentation right.


(Note that the coding specified by the "coding:" line is the coding of the source code, not of the console output. But thank you for your thoughts!)

wovano
  • 4,543
  • 5
  • 22
  • 49
bigturtle
  • 371
  • 2
  • 3
  • 9

5 Answers5

15

The Windows command prompt (cmd.exe) cannot display the Unicode characters you are using, even though Python is handling it in a correct manner internally. You need to use IDLE, Cygwin, or another program that can display Unicode correctly.

See this thread for a full explanation: http://www.nabble.com/unable-to-print-Unicode-characters-in-Python-3-td21670662.html

Brandon
  • 3,684
  • 1
  • 18
  • 25
  • 7
    cmd.exe can display unicode characters if you use a font which can display the desired unicode characters, and if you change the codepage to utf-8 (you can do that with: `CHCP 65001`) – smerlin Mar 23 '11 at 16:03
  • That doesn't really work reliably... besides the MSDN recommends to use UTF-16, the native encoding of alle Windows NT operating systems. – dom0 May 18 '12 at 07:48
  • @csde_rats doesn't they use the older, fixed-width UCS-2 rather than UTF-16? – Kos Nov 06 '12 at 10:53
  • Yes and no. No and yes. Microsoft used UCS-2 a long time ago but switched over to UTF-16 at some point. Still are some functions not really compatible with UTF-16, esp. in the kernels' side of things.... – dom0 Nov 06 '12 at 14:50
12

You may want to try changing the environment variable "PYTHONIOENCODING" to "utf_8." I have written a page on my ordeal with this problem.

daveagp
  • 2,599
  • 2
  • 20
  • 19
2

Check out the question and answer here, I think they have some valuable clues. Specifically, note the setdefaultencoding in the sys module, but also the fact that you probably shouldn't use it.

Community
  • 1
  • 1
itsadok
  • 28,822
  • 30
  • 126
  • 171
1

The problem of displaying Unicode charaters in Python in Windows is known. There is no official solution yet. The right thing to do is to use winapi function WriteConsoleW. It is nontrivial to build a working solution as there are other related issues. However, I have developed a package which tries to fix Python regarding this issue. See https://github.com/Drekin/win-unicode-console. You can also read there a deeper explanation of the problem. The package is also on pypi (https://pypi.python.org/pypi/win_unicode_console) and can be installed using pip.

user87690
  • 687
  • 3
  • 25
  • upvote, `py -mpip install win-unicode-console & py -mrun your_script.py` is the solution for printing Unicode to Windows console with cmd.exe on Python 3 (make sure you've configured appropriate fonts for the console window). – jfs Apr 11 '15 at 14:02
  • @J.F.Sebastian Using `run` is now considered suboptimal. `run` was needed when I didn't know about custom readline hooks. `win_unicode_console.enable()` is enough and it can be put to `sitecustomize` so it is run automatically. Then you can run your script as usual: `py your_script.py`. – user87690 Apr 11 '15 at 16:52
  • I don't want win-unicode-console code in my script (`py -mrun` allows me that). I often run the same script on Python 2 on Unix where `print(unicode_text)` works as is. Modifying `sitecustomize` module is too intrusive for me. It may affect unrelated code. To redirect output to a file I set PYTHONIOENCODING and run `py your_script.py > output.txt`. – jfs Apr 11 '15 at 17:32
1

Here's a dirty hack:

# works
import os
os.system("chcp 65001 &")
print("юникод")

However everything breaks it:

  • simple muting first line already breaks it:

    # doesn't work
    import os
    os.system("chcp 65001 >nul &")
    print("юникод")
    
  • checking for OS type breaks it:

    # doesn't work
    import os
    if os.name == "nt":
        os.system("chcp 65001 &")
    
    print("юникод")
    
  • it doesn't even works under if block:

    # doesn't work
    import os
    if os.name == "nt":
        os.system("chcp 65001 &")
        print("юникод")
    

But one can print with cmd's echo:

# works
import os
os.system("chcp 65001 & echo {0}".format("юникод"))

and here's a simple way to make this cross-platform:

# works

import os

def simple_cross_platrofm_print(obj):
    if os.name == "nt":
        os.system("chcp 65001 >nul & echo {0}".format(obj))
    else:
        print(obj)

simple_cross_platrofm_print("юникод")

but the window's echo trailing empty line can't be suppressed.

Adobe
  • 12,967
  • 10
  • 85
  • 126