Why python 2.7 on Windows need a space before unicode character when print?

Question

I use cmd Windows, chcp 65001, this is my code:

print u'\u0110 \u0110' + '\n'

Result:

 (a character cmd can't display) (character what i want)
 Traceback (most recent call last):
      File "b.py", line 26, in <module>
        print u'\u0110 \u0110'
    IOError: [Errno 2] No such file or directory

But, when i use this code:

print u' \u0110 \u0110' + '\n'

Result:

(a space)(charecter what i want) (character what i want)
Traceback (most recent call last):
  File "b.py", line 26, in <module>
    print u' \u0110 \u0110' + '\n'
IOError: [Errno 2] No such file or directory

My screen: enter image description here

And my question is:

Why python 2.7 need a space when print unicode character?
How to fix IOError: [Errno 2]

In Control Panel, Region and Language, Administrative, what is your "Current language for non-Unicode programs"? Also, what does `chcp` return from the console? These settings affect how Unicode strings are encoded to the console, and vary between internationalized versions of Windows and affect our ability to reproduce your error. I can't think of a reason a space would make a difference, though. — Mark Tolonen, Jun 19 '15 at 02:49
I suspect you have a corrupted installatoin, to get an `IOError` on `print`. — Mark Tolonen, Jun 19 '15 at 02:54
@MarkTolonen That's a known problem on python 2.7/Windows. It can't handle cp 65001. — roeland, Jun 23 '15 at 05:30
@roeland, yes, I know, but I get `LookupError: unknown encoding: cp65001` on Windows with Python 2.7.9, not an `IOError`, so I still wonder how to reproduce the OP's exact error. — Mark Tolonen, Jun 24 '15 at 06:01
You can use the lines `import codecs`, `codecs.register(lambda name: codecs.lookup('utf-8') if name == 'cp65001' else None)` to alias cp65001 to UTF-8. Alternatively you can reproduce it with `sys.stdout.write(u"a→b".encode("utf-8"))` — roeland, Jun 24 '15 at 21:36
related: [Python, Unicode, and the Windows console](https://stackoverflow.com/q/5419/4279) — jfs, Jul 13 '17 at 12:54

score 5 · Accepted Answer · edited May 23 '17 at 10:32

Short answer

On Windows you can't print arbitrary strings using print.

There are some workarounds, as shown here: How to make python 3 print() utf8. But, despite the title of that question, you can't use this to actually print UTF-8 using code page 65001, it will repeat the last few bytes after finishing (as I described further down)

example:

#! python2
import sys

enc = sys.stdout.encoding

def outputUnicode(t):
    bytes = t.encode(enc, 'replace')
    sys.stdout.write(bytes)

outputUnicode(u'The letter \u0110\n')

Long answer

You can change the code page of the console using chcp to a code page which contains the characters you want to print. In your case for instance, run chcp 852.

These are the results on my box if I print following strings. I'm using code page 850, which is the default for English systems:

u"\u00abHello\u00bb"  # "«Hello»" 
u"\u0110"  # "Đ"
u"\u4f60\u597d"  # "你好"
u"a\u2192b\u2192c"  # "a→b→c"

The first command will work, since all characters are in code page 850. The next 3 will fail.

UnicodeEncodeError: 'charmap' codec can't encode character u'\u0110' in position 0: character maps to <undefined>

Change the code page to 852 and the second command will work.

There is an UTF-8 code page (65001) but it doesn't work with python 2.7.

In python 3.4 the results are the same. If you change the code page to 65001 you'll get slightly less broken behaviour.

\Python34\python.exe -c "print(u'a\u2192b\u2192c')" a→b→c �c C:\>

The two extra characters (�c) are a consequence of non-standard behaviour in the C standard library on Windows. They're a repeat of the last 2 bytes in the UTF-8 encoding of the string.

i used to use chcp 65001 but i still got that error and i can't print without a space before first character — pc43, Jun 18 '15 at 22:14
code page 65001 is broken, programs need to use special hacks to be able to output unicode on this code page. — roeland, Jun 18 '15 at 22:28
For the people who are familiar with the C standard IO library: `fwrite` will return the number of *characters*, instead of *bytes* in the string. — roeland, Jun 18 '15 at 22:28
@PhamThanh Yes, that's another bug when using code page 65001, and it's also reproducible with that C `fwrite` function. — roeland, Jun 18 '15 at 22:44
[you can print arbitrary Unicode string whatever `chcp` is](http://stackoverflow.com/a/30982765/4279) — jfs, Jun 22 '15 at 14:38
@J.F.Sebastian Aha good point. Didn't think about overriding stdin. — roeland, Jun 22 '15 at 21:44

score 5 · Answer 2 · edited May 23 '17 at 12:03

On Windows you can print arbitrary strings using print (as long as the font can display the characters). Just print Unicode and configure your environment.

For example, print_unicode.py:

#!/usr/bin/env python
print(u'\u0110\u0110')

To print to Windows console, you could use win-unicode-console package:

T:\> py -mpip install win-unicode-console
T:\> py -mrun print_unicode.py

Don't forget to configure the appropriate console font. chcp return value does not matter in this case.

You can call WriteConsoleW() function (Unicode API) manually, to print arbitrary text to Windows console.

You don't need 3rd party modules, to redirect output to a file:

T:\> set PYTHONIOENCODING=utf-8
T:\> py print_unicode.py >output-utf-8.txt

Note: run module is not used. It works on both Python 2 and 3.

If you don't need to print non-BMP Unicode characters then you could use Python IDLE from stdlib e.g., in Python 3:

T:\> py -3 -midlelib -r print_unicode.py

IDLE is also available on Python 2 but the invocation is different.

Why python 2.7 on Windows need a space before unicode character when print?

2 Answers2

Short answer

Long answer

Linked