python utf-8 japanese

Question

I have some Japanese words I wish to convert to utf-8, as shown below:

jap_word1 = u'中山'
jap_word2 = u'小倉'

print jap_word1.encode('utf-8') # Doesn't work 
print jap_word2.encode('utf-8') # Prints properly

Why is it that one word can be converted properly to utf-8 and printed to show the same characters but not the other?

(I am using python 2.6 on Windows 7 Ultimate)

I strongly recommend that you use "jp" as an abbreviation for Japanese instead of "jap" to avoid racist connotations. In this context, it's blindingly obvious what you mean... but I still noticed. And jp is standard. — Crowbeak, Apr 08 '13 at 05:51

score 1 · Answer 1 · answered Feb 05 '11 at 18:25

Lots of things must align to print characters properly:

What encoding is the script saved in?
Do you have a # coding: xxxx statement in your script, where xxxx matches the encoding the file is saved in?
Does your output terminal support your encoding? import sys; print sys.stdout.encoding a. If not, can you change the console encoding? (chcp command on Windows)
Does the font you are using support the characters?

Saving the script in UTF-8, this works in both PythonWin and IDLE.

# coding: utf-8
jap_word1 = u'中山'
jap_word2 = u'小倉'

print jap_word1
print jap_word2

Interestingly, I got your results with the .encode('utf-8') added to both prints in IDLE, but it worked correctly in Pythonwin, whose default output window supports UTF-8.

Idle is a strange beast. sys.stdout.encoding on my system produces 'cp1252', which doesn't support Asian characters, but it prints the first word wrong and the second one right when printing in UTF-8.

When I do this from the python console in Ubuntu, I don't encounter any problems either. But for Python IDLE in Windows 7, goodness ... — haha, Feb 05 '11 at 18:39

score 0 · Answer 2 · answered Feb 05 '11 at 18:02

0

Because your console is not in UTF-8. Run chcp 65001 before running.

answered Feb 05 '11 at 18:02

Ignacio Vazquez-Abrams

776,304
153
1,341
1,358

I'm using the Python IDLE. The funny thing is that only some characters are being converted correctly but not the rest. – haha Feb 05 '11 at 18:11

python utf-8 japanese

2 Answers2

Linked