How to use Python convert a unicode string to the real string

Question

I have used Python to get some info through urllib2, but the info is unicode string.

I've tried something like below:

a = "\u65b9\u6cd5\uff0c\u5220\u9664\u5b58\u50a8\u5728"
print unicode(a).encode("gb2312")

a = "\u65b9\u6cd5\uff0c\u5220\u9664\u5b58\u50a8\u5728"
print a.encode("utf-8").decode("utf-8")

a = "\u65b9\u6cd5\uff0c\u5220\u9664\u5b58\u50a8\u5728"
print u""+a

a = "\u65b9\u6cd5\uff0c\u5220\u9664\u5b58\u50a8\u5728"
print str(a).decode("utf-8")

a = "\u65b9\u6cd5\uff0c\u5220\u9664\u5b58\u50a8\u5728"
print str(a).encode("utf-8")

a = "\u65b9\u6cd5\uff0c\u5220\u9664\u5b58\u50a8\u5728"
print a.decode("utf-8").encode("gb2312")

but all results are the same:

\u65b9\u6cd5\uff0c\u5220\u9664\u5b58\u50a8\u5728

And I want to get the following Chinese text:

方法，删除存储在

Which python version are you using? Maybe you need `from __future__ import unicode_literals` — gil, Feb 23 '16 at 12:50
My answer: Just use Python 3 and the `a` will be your expected string and you don't need convert it yourself. — Remi Guan, Feb 23 '16 at 12:51
And also [this one](http://stackoverflow.com/questions/2688020/how-to-print-chinese-word-in-my-code-using-python). Oh hey, [there's also another way](http://stackoverflow.com/questions/19371953/python-2-7-converting-unicode-to-chinese-character). — Remi Guan, Feb 23 '16 at 13:22

Rikka · Accepted Answer · 2016-02-23T13:13:38.107

2

You need to convert the string to a unicode string.

First of all, the backslashes in a are auto-escaped:

a = "\u65b9\u6cd5\uff0c\u5220\u9664\u5b58\u50a8\u5728"

print a # Prints \u65b9\u6cd5\uff0c\u5220\u9664\u5b58\u50a8\u5728

a       # Prints '\\u65b9\\u6cd5\\uff0c\\u5220\\u9664\\u5b58\\u50a8\\u5728'

So playing with the encoding / decoding of this escaped string makes no difference.

You can either use unicode literal or convert the string into a unicode string.

To use unicode literal, just add a u in the front of the string:

a = u"\u65b9\u6cd5\uff0c\u5220\u9664\u5b58\u50a8\u5728"

To convert existing string into a unicode string, you can call unicode, with unicode_escape as the encoding parameter:

print unicode(a, encoding='unicode_escape') # Prints 方法，删除存储在

I bet you are getting the string from a JSON response, so the second method is likely to be what you need.

BTW, the unicode_escape encoding is a Python specific encoding which is used to

Produce a string that is suitable as Unicode literal in Python source code

edited Feb 23 '16 at 13:13

answered Feb 23 '16 at 13:08

Rikka

999
8
19

Yes, `unicode_escape` seems the way to go. – mhawke Feb 23 '16 at 13:27
thanks you very very much !!!! – Lex Feb 24 '16 at 01:56
`a = '\u65b9\u6cd5\uff0c\u5220\u9664\u5b58\u50a8\u5728' print(unicode(a, encoding='unicode_escape'))` build error in python3 with " decoding str is not supported",which unicode module should i import or i should by another way in python3? thanks very much! – Lex Aug 12 '16 at 00:48
how to implement this in python3 ? – linrongbin Jan 18 '21 at 06:28

mhawke · Answer 2 · 2016-02-23T13:29:30.260

Where are you getting this data from? Perhaps you could share the method by which you are downloading and extracting it.

Anyway, it kind of looks like a remnant of some JSON encoded string? Based on that assumption, here is a very hacky (and not entirely serious) way to do it:

>>> a = "\u65b9\u6cd5\uff0c\u5220\u9664\u5b58\u50a8\u5728"
>>> a
'\\u65b9\\u6cd5\\uff0c\\u5220\\u9664\\u5b58\\u50a8\\u5728'
>>> s = '"{}"'.format(a)
>>> s
'"\\u65b9\\u6cd5\\uff0c\\u5220\\u9664\\u5b58\\u50a8\\u5728"'
>>> import json
>>> json.loads(s)
u'\u65b9\u6cd5\uff0c\u5220\u9664\u5b58\u50a8\u5728'
>>> print json.loads(s)
方法，删除存储在

This involves recreating a valid JSON encoded string by wrapping the given string in a in double quotes, then decoding the JSON string into a Python unicode string.

How to use Python convert a unicode string to the real string

2 Answers2