Python Unicode string stored as '\u84b8\u6c7d\u5730' in file, how to convert it back to Unicode?

Question

Some Unicode data is stored in file as '\u84b8\u6c7d\u5730' without any encoding.

Is there a way to covert them back in Python?

Do you mean `'\\u84b8\\u6c7d\\u5730'` or as `u'\u84b8\u6c7d\u5730'`? — Chris Morgan, Jun 19 '12 at 04:34
@Chris: No need to escape the backslashes, as `\u` isn't a valid escape in bytestrings. — Ignacio Vazquez-Abrams, Jun 19 '12 at 04:37
@IgnacioVazquez-Abrams: I know; I put it with the doubled backslashes to make my meaning more obvious — Chris Morgan, Jun 19 '12 at 04:37
As you've accepted Ignacio's answer, this must be a duplicate of [How do I treat an ASCII string as unicode and unescape the escaped characters in it in python?](http://stackoverflow.com/questions/267436/how-do-i-treat-an-ascii-string-as-unicode-and-unescape-the-escaped-characters-in) — Chris Morgan, Jun 19 '12 at 04:47
I agree. I just cannot find out the right article for this issue. — lucemia, Jun 19 '12 at 04:48

score 48 · Accepted Answer · answered Jun 19 '12 at 04:35

48

>>> print '\u84b8\u6c7d\u5730'.decode('unicode-escape')
蒸汽地

answered Jun 19 '12 at 04:35

Ignacio Vazquez-Abrams

776,304
153
1,341
1,358

2

I think this is probably what he means, but I'm not sure... if it is, it's a duplicate, anyway. – Chris Morgan Jun 19 '12 at 04:36
That's good. Here's another alternate form: `s= unicode('\u84b8\u6c7d\u5730', "unicode-escape")`. – Keith Jun 19 '12 at 04:38
I spent a lot of time trying to solve this problem, now I saw your solution! – smohamed Aug 09 '16 at 18:11
amazing! This should have been part of the lengthy treatise on handling unicode in python. Standard docs are lacking on this fix. – Marc Maxmeister Jul 18 '17 at 14:57
`print(u'\u84b8\u6c7d\u5730')` – Vladislav May 29 '23 at 07:19

score 1 · Answer 2 · answered Aug 25 '23 at 06:13

This code helped me to decode the string in Python 3:

text = '\\u041d\\u0435\\u0442 \\u043f\\u0430\\u0440\\u0430\\u043c\\u0435\\u0442\\u0440\\u0430'
res = text.encode().decode('unicode_escape')
print(res)

encode() - convert a str to a bytes object
decode('unicode_escape') - convert a bytes object to a str using codec unicode_escape. See Python 3 Standard Encodings.

Python Unicode string stored as '\u84b8\u6c7d\u5730' in file, how to convert it back to Unicode?

2 Answers2

Linked

Related