16

Some Unicode data is stored in file as '\u84b8\u6c7d\u5730' without any encoding.

Is there a way to covert them back in Python?

dda
  • 6,030
  • 2
  • 25
  • 34
lucemia
  • 6,349
  • 5
  • 42
  • 75
  • 3
    Do you mean `'\\u84b8\\u6c7d\\u5730'` or as `u'\u84b8\u6c7d\u5730'`? – Chris Morgan Jun 19 '12 at 04:34
  • @Chris: No need to escape the backslashes, as `\u` isn't a valid escape in bytestrings. – Ignacio Vazquez-Abrams Jun 19 '12 at 04:37
  • @IgnacioVazquez-Abrams: I know; I put it with the doubled backslashes to make my meaning more obvious – Chris Morgan Jun 19 '12 at 04:37
  • As you've accepted Ignacio's answer, this must be a duplicate of [How do I treat an ASCII string as unicode and unescape the escaped characters in it in python?](http://stackoverflow.com/questions/267436/how-do-i-treat-an-ascii-string-as-unicode-and-unescape-the-escaped-characters-in) – Chris Morgan Jun 19 '12 at 04:47
  • I agree. I just cannot find out the right article for this issue. – lucemia Jun 19 '12 at 04:48

2 Answers2

48
>>> print '\u84b8\u6c7d\u5730'.decode('unicode-escape')
蒸汽地
Ignacio Vazquez-Abrams
  • 776,304
  • 153
  • 1,341
  • 1,358
1

This code helped me to decode the string in Python 3:

text = '\\u041d\\u0435\\u0442 \\u043f\\u0430\\u0440\\u0430\\u043c\\u0435\\u0442\\u0440\\u0430'
res = text.encode().decode('unicode_escape')
print(res)
  • encode() - convert a str to a bytes object
  • decode('unicode_escape') - convert a bytes object to a str using codec unicode_escape. See Python 3 Standard Encodings.
yesnik
  • 4,085
  • 2
  • 30
  • 25