16

I'm using Python 2.6 on Windows 7

I borrowed some code from here: Python, Unicode, and the Windows console

My goal is to be able to display uft-8 strings in the windows console.

Apparantly in python 2.6, the

sys.setdefaultencoding()

is no longer supported

However, I wrote reload(sys) before I tried to use it and it magically didn't error.

This code will NOT error, but it shows funny characters instead of japanese text. I believe the problem is because I have not successfully changed the codepage of the windows console.

These are my attempts, but they don't work:

reload(sys)
sys.setdefaultencoding('utf-8')

print os.popen('chcp 65001').read()

sys.stdout.encoding = 'cp65001'

Perhaps you can use win32console to change the codepage? I tried the code from the website I linked, but it also errored from the win32console.. maybe that code is obsolete.

Here's my code, that doesn't error but prints funny characters:

#coding=<utf8>
import os
import sys
import codecs



reload(sys)
sys.setdefaultencoding('utf-8')
sys.stdout = codecs.getwriter('utf8')(sys.stdout)
sys.stderr = codecs.getwriter('utf8')(sys.stderr)

#print os.popen('chcp 65001').read()
print(sys.stdout.encoding)
sys.stdout.encoding = 'cp65001'
print(sys.stdout.encoding)

x = raw_input('press enter to continue')

a = 'こんにちは世界'#.decode('utf8')
print a

x = raw_input()
Community
  • 1
  • 1
russo
  • 271
  • 2
  • 3
  • 9

4 Answers4

15

I know you state you're using Python 2.6, but if you're able to use Python 3.3 you'll find that this is finally supported.

Use the command chcp 65001 before starting Python.

See http://docs.python.org/dev/whatsnew/3.3.html#codecs

In Python 3.6 it's no longer even necessary to use the chcp command, since Python bypasses the byte-level console interface entirely and uses a native Unicode interface instead. See PEP 528: Change Windows console encoding to UTF-8.

As noted in the comments by @mbom007, it's also important to make sure the console is configured with a font that supports the characters you're trying to display.

Mark Ransom
  • 299,747
  • 42
  • 398
  • 622
  • The font that the console is using must also support the characters being output, or they'll show up as squares. If Japanese is the goal, the font "NSimSun" supports them. In Windows 10, this font is in the list on the cmd properties page. – mbomb007 Mar 01 '18 at 16:21
  • @mbomb007 Great comment, I added something like that to the answer. – Mark Ransom Mar 01 '18 at 16:37
  • I was having the same problem (with Japanese chars), and I spent at least an hour trying to get it to work. It's nice that a built-in font supports them. – mbomb007 Mar 01 '18 at 20:44
10

Never ever ever use setdefaultencoding. If you want to write unicode strings to stdio, encode them explicitly. Monkeying around with setdefaultencoding will cause stdlib modules and third-party modules alike to break in horrible subtle ways by allowing implicit conversion between str and unicode when it shouldn't happen.

Yes, the problem is most likely that your code page isn't set properly. However, using os.popen won't change the code page; it'll spawn a new shell, change its code page, and then immediately exit without affecting your console at all. I'm not personally very familiar with windows, so I couldn't tell you how to change your console's code page from within your python program.

The way to properly display unicode data via utf-8 from python, as mentioned before, is to explicitly encode your strings before printing them: print s.encode('utf-8')

habnabit
  • 9,906
  • 3
  • 32
  • 26
  • 2
    Regarding "Never ever ever use setdefaultencoding." I do not think your reasoning for this is valid - it is insufficient at best. In fact, it is OK to set it to 'utf-8' as ascii is only a subset of it. If by setting it any problem arises in a module, it is the bug of the module. If you oppose, could you show us counterexamples? – OTZ Aug 27 '10 at 03:59
  • 2
    @otz, the stdlib and many, many third-party libraries assume ASCII is the default python encoding. There's a good discussion of why setting the default encoding is silly here: http://faassen.n--tree.net/blog/view/weblog/2005/08/02/0 – habnabit Aug 27 '10 at 06:27
  • 3
    @otz, some other things not covered by that article: mixing text (unicode strings) and bytes is a nonsense operation anyway. If the bytes represent text, they should be decoded to unicode anyway. Increasing the likelihood that a meaningless operation will accidentally succeed without any warning is not exactly the best thing if you want to write sane code. As I already said, a lot of existing python code relies on ASCII being the default; if implicit encodings were turned off, the code would break. – habnabit Aug 27 '10 at 06:32
  • 1
    Interestingly, 'utf-8' is the default encoding in Python 3, so setting it in a Python 2 environment in most cases is safe. Indeed, despite the bad practice, our organization has set 'utf-8' as the default in some of our heavily-internationalized and network-heavy apps without incident. – Jason R. Coombs Jan 12 '14 at 17:58
  • 1
    @JasonR.Coombs: Python 3 generally forbids mixing of bytes and Unicode and especially implicit conversions e.g., `b"abc".encode(enc)` works in Python 2 (bytes -> implicit conversion to Unicode using default encoding -> conversion to bytes using `enc` encoding) but it breaks loudly (AttributeError) on Python 3. There are less places for bugs to hide that is why `utf-8` default encoding does not lead to the same problems as on Python 2. btw, `setdefaultencoding` on Python 2 makes finding errors *harder* -- your network-heavy app can corrupt data without you noticing it! – jfs Dec 29 '14 at 10:58
6

Changing the console code page is both unnecessary and won't work (in particular, setting it to 65001 runs into a Python bug). See this question for details, and for how to print Unicode characters to the console regardless of the code page.

Community
  • 1
  • 1
Daira Hopwood
  • 2,264
  • 22
  • 14
  • This is wrong. In order for the output to be displayed properly, the code page used must support the characters being output. The console's font must also support the characters. – mbomb007 Mar 01 '18 at 16:20
  • No, you're mistaken. The only solution that will actually work is to *ignore* code pages and use Unicode APIs, as my answer to the question I linked does. And this approach is perfectly capable of displaying characters that are *not* in the console code page. – Daira Hopwood Mar 01 '18 at 19:12
  • That problem in the linked question only occurs if you use `cp65001` in your Python code instead of `utf-8`. The OP's question does not require such usage, hence why the most upvoted answer works. – mbomb007 Mar 01 '18 at 20:49
  • I tried the linked answer, and it does work, as long as the font supports the output characters. The top answer's method does work, too, though, so your statement that this is the only way is wrong. – mbomb007 Mar 01 '18 at 21:01
3

Windows doesn't support UTF-8 in a console properly. The only way I know of to display Japanese in the console is by changing (on XP) Control Panel's Regional and Language Options, Advanced Tab, Language for non-Unicode Programs to Japanese. After rebooting, open a console and run "chcp" to find out the Japanese console's code page. Then either print Unicode strings or byte strings explicitly encoded in the correct code page.

Mark Tolonen
  • 166,664
  • 26
  • 169
  • 251