how to print the right character from unicode like "\\u201c借\\u201d东风" in python 3?

Question

# coding=utf-8
import codecs

str_unicode = "\\u201c借\\u201d东风"
str_bytes = codecs.decode(str_unicode, 'unicode-escape')
print(str_bytes)

it print “å”ä¸é£ at the console.

What is it you're trying to print? What's your expected output? — Adam Smith, Apr 21 '19 at 06:23

score 3 · Accepted Answer · answered Apr 21 '19 at 06:31

Francisco Couzo correctly describes your issue. If you have control of the string, you should avoid escaping the quotation mark characters in your Unicode string. But I'm guessing that you didn't actually write that string yourself as a literal, but rather, you got it from external source (like a file).

If your Unicode string already has the extra escape characters in it, you can fix the problem by first encoding your data (using str.encode), then stripping the extra backslashes from the already encoded characters, then finally decoding again:

str_unicode = "\\u201c借\\u201d东风"  # or somefile.read(), or whatever

fixed = str_unicode.encode('unicode-escape').replace(b'\\\\', b'\\').decode('unicode-escape')

print(fixed)  # prints “借”东风

Thanks, your solution work. and your guess is right, I use https://github.com/hardikvasa/google-images-download to extact image metadata to a json file, then I got str_unicode from this json file. — GoTop, Apr 21 '19 at 06:44
@GoTop: I'm glad this answer was useful to you. If you think it is the best answer to your question, please consider [clicking the check mark on the left to it to accept it](https://stackoverflow.com/help/accepted-answer). — Blckknght, Apr 21 '19 at 07:28

score 1 · Answer 2 · answered Apr 21 '19 at 06:23

1

You're not escaping the characters correctly, you have an extra \:

>>> print("\u201c借\u201d东风")
“借”东风

answered Apr 21 '19 at 06:23

Francisco

10,918
6
34
45

Yes, manually delete the "\" work, but I have to do it in the python script. – GoTop Apr 21 '19 at 06:45

score -2 · Answer 3 · answered Apr 21 '19 at 06:28

-2

The Unicode standard contains a lot of tables listing characters and their corresponding code points:

0061    'a'; LATIN SMALL LETTER A
0062    'b'; LATIN SMALL LETTER B
0063    'c'; LATIN SMALL LETTER C
...
007B    '{'; LEFT CURLY BRACKET
...
2167    'Ⅶ': ROMAN NUMERAL EIGHT
2168    'Ⅸ': ROMAN NUMERAL NINE
...
265E    '♞': BLACK CHESS KNIGHT
265F    '♟': BLACK CHESS PAWN
...
1F600   '': GRINNING FACE
1F609   '': WINKING FACE
...

You can find out here on python 3 documentation on this link Unicode Python 3

answered Apr 21 '19 at 06:28

Saad Ahmad

36
6

1

This doesn't seem to answer the question that was asked at all. Just linking to the Python docs is not nearly good enough. – Blckknght Apr 21 '19 at 06:32
I know the unicode \u201c mean “ , and \u201d mean ”, but I have to make these unicode print the right character in the console. so your answer doesn't help. but thanks anyway. – GoTop Apr 21 '19 at 06:35

how to print the right character from unicode like "\\u201c借\\u201d东风" in python 3?

3 Answers3

Linked