6

Let's say

s = u"test\u0627\u0644\u0644\u0647 \u0623\u0643\u0628\u0631\u7206\u767A\u043E\u043B\u043E\u043B\u043E"

If I try to print it directly,

>>> print s
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'cp932' codec can't encode character u'\u0627' in position 4: illegal multibyte sequence

So I change the console into UTF-8 from within Python (otherwise it won't understand my input).

import win32console
win32console.SetConsoleOutputCP(65001)
win32console.SetConsoleCP(65001)

And then output the string encoded as utf-8, because Python doesn't know that chcp 65001 is UTF-8 (a known bug).

>>> print s.encode('utf-8')
testالله أكبر爆発ололоTraceback (most recent call last):
  File "<stdin>", line 1, in <module>
IOError: [Errno 0] Error

As you can see, it prints successfully until it hits a newline, then it throws an IOError.

The following workaround works:

def safe_print(str):
    try:
        print str.encode('utf-8')
    except:
        pass
    print

>>> safe_print(s)
testالله أكبر爆発ололо

But there must be a better way. Any suggestions?

Nikolai
  • 3,053
  • 3
  • 24
  • 33
  • 1
    I hope you don't actually call the argument `str`. Avoid shadowing builtins. – Chris Morgan Aug 16 '11 at 15:46
  • @Chris: How is one supposed to know what is a builtin and what isn’t? It’s a very natural thing to do. How can you guarantee clean namespace behavior without requiring universal knowledge for starting? – tchrist Aug 16 '11 at 19:53
  • In this case, though, it is potentially very confusing, as the `str` type does have an encode method. – agf Aug 17 '11 at 10:30
  • @tchrist - Most programming editors with a python mode should highlight builtins in a different colour. This is the easiest way to make sure you don't accidentally use one as a variable or argument name. – DaveP Aug 22 '11 at 06:02
  • @DaveP: I've never used a colorified editor in my life. I find that languages that require IDEs to program in are just too hard. A person should be able to do it on their own without a program as a crutch. Too fragile and dangerous otherwise. – tchrist Aug 22 '11 at 13:08
  • 3
    @tchrist: If you never use syntax hiliting, you are making your life harder than it needs to be. It catches a lot of small problems, such as *ahem* shadowing built-ins and unclosed comments/strings. Too fragile and dangerous otherwise. ;-) – marcus Sep 09 '11 at 15:35

2 Answers2

4

Searching SO for python utf8 windows brings as the first result the question Getting python to print in UTF8 on Windows XP with the console which describes what's the problem with printing utf8 in Windows from Python.

Community
  • 1
  • 1
Piotr Dobrogost
  • 41,292
  • 40
  • 236
  • 366
1

I didn't test it on windows, but here you can get small initialization script for both win/linux to setup output encoding properly, including logging interface, etc. The module also makes output colored (including update of 'logging' interface)? but you can cut it off unnecessary functionality easily :-).

How to invoke non-colored variant:

#!/usr/bin/env python
# -*- coding: utf-8 -*-
from setupcon import setup_console
setup_console('utf-8', False)

and colored variant:

import setupcon
setupcon.setup_console()
import logging
#...
if setupcon.ansi:
    logging.getLogger().addHandler(setupcon.ColoredHandler())

If the solution works for you, you can either read the documentation here: http://habrahabr.ru/blogs/python/117236/, in Russian, or I/somebody can translate it for you on demand :-).

dmitry_romanov
  • 5,146
  • 1
  • 33
  • 36