# coding=utf-8
import codecs
str_unicode = "\\u201c借\\u201d东风"
str_bytes = codecs.decode(str_unicode, 'unicode-escape')
print(str_bytes)
it print “å”ä¸é£ at the console.
# coding=utf-8
import codecs
str_unicode = "\\u201c借\\u201d东风"
str_bytes = codecs.decode(str_unicode, 'unicode-escape')
print(str_bytes)
it print “å”ä¸é£ at the console.
Francisco Couzo correctly describes your issue. If you have control of the string, you should avoid escaping the quotation mark characters in your Unicode string. But I'm guessing that you didn't actually write that string yourself as a literal, but rather, you got it from external source (like a file).
If your Unicode string already has the extra escape characters in it, you can fix the problem by first encoding your data (using str.encode
), then stripping the extra backslashes from the already encoded characters, then finally decoding again:
str_unicode = "\\u201c借\\u201d东风" # or somefile.read(), or whatever
fixed = str_unicode.encode('unicode-escape').replace(b'\\\\', b'\\').decode('unicode-escape')
print(fixed) # prints “借”东风
You're not escaping the characters correctly, you have an extra \
:
>>> print("\u201c借\u201d东风")
“借”东风
The Unicode standard contains a lot of tables listing characters and their corresponding code points:
0061 'a'; LATIN SMALL LETTER A
0062 'b'; LATIN SMALL LETTER B
0063 'c'; LATIN SMALL LETTER C
...
007B '{'; LEFT CURLY BRACKET
...
2167 'Ⅶ': ROMAN NUMERAL EIGHT
2168 'Ⅸ': ROMAN NUMERAL NINE
...
265E '♞': BLACK CHESS KNIGHT
265F '♟': BLACK CHESS PAWN
...
1F600 '': GRINNING FACE
1F609 '': WINKING FACE
...
You can find out here on python 3 documentation on this link Unicode Python 3