How can I correctly show a Chinese string from calling raw_input() in Python?

Question

I was trying to conquer the Chinese encoding issue. So far the only obstacle before me is to show correctly the user typed in the raw_input(). For example, my abstract solution about this part is like this:

#coding: utf-8
name_a = raw_input('请输入文字')
print name_a

But by this I can only show '请输入文字' correctly. If a user typed in a Chinese character in raw_input(), the print name_a will show utf-8 code point like '/xb7'

Anyhow, I've searched online, and find this code can solve my problem:

#coding: utf-8

n=raw_input(unicode('请输入文字','utf-8').encode('gbk'))
print n

It worked! But when I trying to plant it into my own code. My code can't be executed. And there was something even more weird happened. Later when I copy this code into other empty file, it can't work, too. Just a blink and the program was over (I know the feature of python under Windows environment so I added x = input() at the end of the file). And later I deleted the original test py file which contains the second code paragraph. Now I can't execute it in any new created file now.

I'm using Python 2.7 under Windows XP environment

What happened?

And is there another way that can help me show the Chinese content the user typed into the raw_input()?

Thank you very much.

Perhaps relevant, since this looks like this might be a Windows issue? http://stackoverflow.com/q/4942305/646543 — Michael0x2a, Jan 07 '14 at 06:58
@alvas, I get some useful ideas here, and my problem was solved in an concidental way, which I realized I should use %s instead of %r in my code, and then I can get Chinese symbols. By the way, I didn't type my whole code here in this question. — Mario, Jan 09 '14 at 14:00

C0deH4cker · Answer 1 · 2014-01-07T06:59:57.150

1

This works for me:

x = raw_input(u'请输入文字'.encode("utf8"))

Also, I was successfully able to just print(x) and have it show the Chinese characters, so it's likely an OS (or terminal client) font rendering issue you are having.

Side note: NEVER use input() in Python 2.x. It implicitly evals all data the user enters (input() == eval(raw_input())), so something like __import__("os").system("rm -rf /*") would wipe the hard drive of the computer running the Python script.

edited Jan 07 '14 at 06:59

answered Jan 07 '14 at 06:45

C0deH4cker

3,959
1
24
35

I just tried testing this, and it didn't work for me. Command prompt displays "Φ»╖Φ╛ôσàÑµûçσ¡ù". Which OS are you using? – Michael0x2a Jan 07 '14 at 06:49
i'm confused. how did `input()` related to the crazy `rm -rf` from `os.system`?? – alvas Jan 07 '14 at 06:49
@alvas Since in Python 2.x `input()` is equivalent to `eval(raw_input())`, any user input is evaluated as Python code. So if for example this script were running on a server and the user entered that code, your server's hard drive would be wiped. I like to always write and encourage secure coding habits. – C0deH4cker Jan 07 '14 at 06:52
@Michael0x2a Works on OS X and Ubuntu. – C0deH4cker Jan 07 '14 at 06:55

alvas · Answer 2 · 2014-01-07T07:00:04.580

1

Using this code:

#coding: utf-8

x = raw_input(u'请输入文字: '.encode("utf8"))
print "reprinting:",x

[out]:

$ python test.py
请输入文字: 123
reprinting: 123

So possibly it's not a python issue but an issue on your OS. If you get jumbled up outputs on the console, maybe it's font.

edited Jan 07 '14 at 07:00

answered Jan 07 '14 at 06:52

alvas

115,346
109
446
738

score 0 · Answer 3 · edited Jan 07 '14 at 07:34

0

Try this:

#coding: utf-8
import sys

name_a = raw_input('请输入文字：'.encode(sys.stdout.encoding))
print name_a.decode(sys.stdout.encoding)
print name_a

Sample output:

请输入文字：中文字符
中文字符
中文字符

edited Jan 07 '14 at 07:34

C0deH4cker

3,959
1
24
35

answered Jan 07 '14 at 07:10

zufuliu

372
4
5

I tested this on python, but it shows UnicodeDEcodeError: 'ascii' codec can't decode byte 0xe8 in position 0:ordinal no in range(128) I was using python2.7 on Ubuntu at the moment – Mario Jan 07 '14 at 12:26
above code works when Python's sys default encoding is utf-8. add following two lines after import sys, it shpuld works. reload(sys) sys.setdefaultencoding('utf-8') – zufuliu Jan 08 '14 at 07:44

Jeremy Anifacc · Answer 4 · 2015-10-18T04:31:57.787

0

My script main.py

# -*- coding: utf-8 -*-
import sys
reload(sys) 
sys.setdefaultencoding('utf-8')

name = raw_input("中文名字: ".encode(sys.stdout.encoding)) 
print name

or

name = raw_input("中文名字: ".encode(sys.stdin.encoding))

I invoke it use window pshell and (python 2.7.8)

python mian.py

then can display correctly:

中文名字: 中文名字测试
中文名字测试

May it helps someone.

edited Oct 18 '15 at 04:31

answered Oct 18 '15 at 04:23

Jeremy Anifacc

1,003
10
7

Do not cargo-cult the `sys.setdefaultencoding()` trick please. There is no need to clobber the standard system codec, and this can break code that actually relies on non-ASCII data throwing an exception when implicitly decoded or encoded. – Martijn Pieters Oct 27 '16 at 09:43
Thanks for your comments very much. I will nerve do it again, I am a newcomer. – Jeremy Anifacc Mar 31 '17 at 03:20

How can I correctly show a Chinese string from calling raw_input() in Python?

4 Answers4