Python STILL won't allow Japanese Characters despite specifying the encoding

Question

#!/usr/bin/env python
# -*- coding: utf8 -*-
print "私"
print u"私"

the result:

ç§
UnicodeEncodeError: 'ascii' codec can't encode character u'\u79c1' in position 0: ordinal not in range(128)

Or, in Idle for both u"私" and "私":

>>> print "私"
Unsupported characters in input

I've followed all the advice I could find which says that I have to put the "coding" line under the shebang. All my web-browsers display kanji fine, and I can type it fine. But this garble comes out when I try and use it in Python :( Any ideas?

@Wooble, the characters will be converted from UTF-8 input to an internal representation of codepoints. It's the internal representation that's being displayed in the error. — Mark Ransom, Jul 05 '11 at 15:39
which Python version have you installed? IDLE 2.6.5 >>> # -*- coding: utf-8 -*- >>> print "私" 私 — Ant, Jul 05 '11 at 15:16
I have Python 2.7.1 installed on Win Vista. I tried your example, but still no luck :( — Matthew, Jul 05 '11 at 16:46
Yeah, the Windows Command Prompt is pretty much a dead loss for Unicode(*). You can spit out `私` onto a web page with Python, or encode it into a file, but console IO and Unicode don't mix well under Windows. The nearest you can get is using `chcp 65001` and outputting with UTF-8-encoding, but there are significant problems with that. — bobince, Jul 08 '11 at 00:02
(\*: when using the standard ANSI C stdio library, like Python and most other cross-platform languages do. There *is* also a Win32-special `WriteConsoleW` function which can be used to get Unicode output into the terminal window, but there are some problems making that work smoothly with the stdio model and even then, most characters won't render right for the default font settings. In summary, Windows+Unicode+console is still a right old mess and doesn't work well for many languages, not just Python.) — bobince, Jul 08 '11 at 00:05

score 6 · Answer 1 · answered Jul 05 '11 at 15:30

6

You specified the encoding of the source file and supposedly saved the files using UTF-8.

Still your stdout is using ascii so it is normal to fail.

You have an encoding issue not a decoding issue, Python does read your Unicode characters just fine, probably will be able to save them inside a file if you choose right encoding.

Still, stdout is not always Unicode compatible, especially on Windows.

You could do something like this: sys.stdout.write(strin.encode(utf-8)) and you will not get an error, but this does not mean that you will see the characters on the screen.

answered Jul 05 '11 at 15:30

sorin

161,544
178
535
806

Probably not Windows, with the `#!` at the start of the script. – Mark Ransom Jul 05 '11 at 15:37
Thanks for your answer, but it still doesn't like it. I'm running Python 2.7.1 on WinVista – Matthew Jul 05 '11 at 16:52

score 3 · Accepted Answer · answered Jul 06 '11 at 21:36

3

You need a terminal or IDE that supports UTF8, or at least an encoding that supports Japanese. PythonWin, from the Pywin32 extension library, is an IDE that will work.

answered Jul 06 '11 at 21:36

Mark Tolonen

166,664
26
169
251

I think this has solved it! Now using python 3.2 too! ^_^ I owe you a beer if you ever come to Japan. – Matthew Jul 12 '11 at 15:47

score 2 · Answer 3 · edited Mar 04 '13 at 10:29

2

Try this:

#!/usr/bin/env python
# -*- coding: utf8 -*-
print unicode("私","UTF-8")

edited Mar 04 '13 at 10:29

pradyunsg

18,287
11
43
96

answered Jul 06 '11 at 03:41

pahan

2,445
1
28
36

score 0 · Answer 4 · edited May 23 '17 at 12:09

sorin's answer is correct. There's another question which covers the same ground: Setting the correct encoding when piping stdout in Python

Python is applying a default encoding when it writes the output, and this encoding is not UTF-8.

The error from IDLE is because IDLE interprets input according to the system locale. Windows does not provide a locale that accepts UTF-8 input, so the default does not accept arbitrary Unicode. You may change the default with the simple instructions in this answer. You'll still get the incorrect output without reencoding it.

Python STILL won't allow Japanese Characters despite specifying the encoding

4 Answers4