2
#!/usr/bin/env python
# -*- coding: utf8 -*-
print "私"
print u"私"

the result:

ç§
UnicodeEncodeError: 'ascii' codec can't encode character u'\u79c1' in position 0: ordinal not in range(128)

Or, in Idle for both u"私" and "私":

>>> print "私"
Unsupported characters in input

I've followed all the advice I could find which says that I have to put the "coding" line under the shebang. All my web-browsers display kanji fine, and I can type it fine. But this garble comes out when I try and use it in Python :( Any ideas?

Comic Sans MS Lover
  • 1,729
  • 5
  • 26
  • 52
Matthew
  • 23
  • 1
  • 3
  • 1
    what's your OS? what's your console or shell encoding? – CharlesB Jul 05 '11 at 15:20
  • @Wooble, the characters will be converted from UTF-8 input to an internal representation of codepoints. It's the internal representation that's being displayed in the error. – Mark Ransom Jul 05 '11 at 15:39
  • Running Python 2.7.1 on WinVista :/ – Matthew Jul 05 '11 at 16:55
  • which Python version have you installed? IDLE 2.6.5 >>> # -*- coding: utf-8 -*- >>> print "私" 私 – Ant Jul 05 '11 at 15:16
  • I have Python 2.7.1 installed on Win Vista. I tried your example, but still no luck :( – Matthew Jul 05 '11 at 16:46
  • Yeah, the Windows Command Prompt is pretty much a dead loss for Unicode(*). You can spit out `私` onto a web page with Python, or encode it into a file, but console IO and Unicode don't mix well under Windows. The nearest you can get is using `chcp 65001` and outputting with UTF-8-encoding, but there are significant problems with that. – bobince Jul 08 '11 at 00:02
  • 1
    (\*: when using the standard ANSI C stdio library, like Python and most other cross-platform languages do. There *is* also a Win32-special `WriteConsoleW` function which can be used to get Unicode output into the terminal window, but there are some problems making that work smoothly with the stdio model and even then, most characters won't render right for the default font settings. In summary, Windows+Unicode+console is still a right old mess and doesn't work well for many languages, not just Python.) – bobince Jul 08 '11 at 00:05

4 Answers4

6

You specified the encoding of the source file and supposedly saved the files using UTF-8.

Still your stdout is using ascii so it is normal to fail.

You have an encoding issue not a decoding issue, Python does read your Unicode characters just fine, probably will be able to save them inside a file if you choose right encoding.

Still, stdout is not always Unicode compatible, especially on Windows.

You could do something like this: sys.stdout.write(strin.encode(utf-8)) and you will not get an error, but this does not mean that you will see the characters on the screen.

sorin
  • 161,544
  • 178
  • 535
  • 806
3

You need a terminal or IDE that supports UTF8, or at least an encoding that supports Japanese. PythonWin, from the Pywin32 extension library, is an IDE that will work.

Mark Tolonen
  • 166,664
  • 26
  • 169
  • 251
  • I think this has solved it! Now using python 3.2 too! ^_^ I owe you a beer if you ever come to Japan. – Matthew Jul 12 '11 at 15:47
2

Try this:

#!/usr/bin/env python
# -*- coding: utf8 -*-
print unicode("私","UTF-8")
pradyunsg
  • 18,287
  • 11
  • 43
  • 96
pahan
  • 2,445
  • 1
  • 28
  • 36
0

sorin's answer is correct. There's another question which covers the same ground: Setting the correct encoding when piping stdout in Python

Python is applying a default encoding when it writes the output, and this encoding is not UTF-8.

The error from IDLE is because IDLE interprets input according to the system locale. Windows does not provide a locale that accepts UTF-8 input, so the default does not accept arbitrary Unicode. You may change the default with the simple instructions in this answer. You'll still get the incorrect output without reencoding it.

Community
  • 1
  • 1
Mark Ransom
  • 299,747
  • 42
  • 398
  • 622