How to convert a string containing unicode escapes into a unicode string containing those characters?

Question

I have this string

a = "ID\u65e0\u6548 99999"

>>> print u"ID\u65e0\u6548 99999"
ID无效 99999
>>> print unicode("ID\u65e0\u6548 99999")
ID\u65e0\u6548 99999

Now I want to print out as first output ID无效 99999 without using u prefix. Is there a solution? Thanks!!

Since I'm using Python2.7 and I need to use this in my project, there is no possibility to use python3.

The "fix" is to use Python 3 instead. There is no rational reason to try to avoid the `u` prefix for Unicode strings in Python 2; you should explain why it's unacceptable (or perhaps just delete this question, as it is likely to collect downvotes as it is now). — tripleee, Sep 11 '15 at 07:34
possible duplicate of [Suppress the u'prefix indicating unicode' in python strings](http://stackoverflow.com/questions/761361/suppress-the-uprefix-indicating-unicode-in-python-strings) — Holt, Sep 11 '15 at 07:38
@tripleee: I don't see how Python 3 is relevant here. You could use `from __future__ import unicode_literals` if you want to interpret `"abc"` as a Unicode string on Python 2 and you can't write `u"abc"` explicitly for some weird reason. — jfs, Sep 11 '15 at 20:44
It's not about avoiding the prefix. Perhaps this string with unicode escapes isn't a literal but rather the result of reading a file or such. In that case neither use of the prefix nor `unicode_literals` would help. — Dan D., Sep 12 '15 at 05:15
@DanD.: if it is not literal in the source code then perhaps `json.loads()` should be used if input is JSON or perhaps `ast.literal_eval()` if input is a Python literal in a variable. If it is the latter then fixing the upstream to avoid generating the data in such format is also an option. If you need to call `.decode('unicode_escape')`; something is broken. — jfs, Sep 12 '15 at 10:54
@J.F.Sebastian Thanks for the generic consideration. Like I said, I just want to use these two prints and "\uxxx" like string in my project. This is not a sample but real code from my project. I can use "u" prefix in other places but here I just don't know why they are different. So, does it make sense to you? Dan's answer does help to answer this question, although it's not a general one. — Stephen Lin, Sep 12 '15 at 14:29
@StephenLin: if it is a real code then it is a bug and `.decode('unicode-escape')` is the *wrong* way to fix it; you should use `u''` instead of `''` instead. *"..but here I just don't know why they are different"*: they are different because bytestring literals and Unicode literals are different (moreover `type('')` != `type(u'')`) and they use (similar but) different syntax in Python. Does it surprise you that `r'[\\]'` and `'[\\]'` are different? — jfs, Sep 12 '15 at 14:35

Dan D. · Accepted Answer · 2015-09-11T07:36:00.843

0

You can decode the string with the unicode_escape codec to get a unicode string that contains those unicode escapes as actual characters:

>>> print "ID\u65e0\u6548 99999".decode('unicode_escape')
ID无效 99999

(Tested under Python 2.7.3)

edited Sep 11 '15 at 07:36

answered Sep 11 '15 at 07:32

Dan D.

73,243
15
104
123

That works! Thanks! Could you please help explain why? – Stephen Lin Sep 11 '15 at 07:35

How to convert a string containing unicode escapes into a unicode string containing those characters?

1 Answers1