In python I'm having a unicode escape sequence that is originally a part of chinease text. I'm trying to display it properly (to convert into Unicode string).
Searching SO I've tried several proposed ways but none of them work.
Here's what I got:
import re
import codecs
import urllib
ESCAPE_SEQUENCE_RE = re.compile(r'''
( \\U........ # 8-digit hex escapes
| \\u.... # 4-digit hex escapes
| \\x.. # 2-digit hex escapes
| \\[0-7]{1,3} # Octal escapes
| \\N\{[^}]+\} # Unicode characters by name
| \\[\\'"abfnrtv] # Single-character escapes
)''', re.UNICODE | re.VERBOSE)
def decode_escapes(s):
def decode_match(match):
return codecs.decode(match.group(0), 'unicode-escape')
return ESCAPE_SEQUENCE_RE.sub(decode_match, s)
print(decode_escapes('\u6240\u8BF7\u6C42\u7684\u8FD4\u7A0B\u65E5'))
Trying to execute this code will fail with this error:
Traceback (most recent call last):
File "Test.py", line 21, in <module>
print(decode_escapes('\u6240\u8BF7\u6C42\u7684\u8FD4\u7A0B\u65E5'))
File "C:\Python27\lib\encodings\cp437.py", line 12, in encode
return codecs.charmap_encode(input,errors,encoding_map)
UnicodeEncodeError: 'charmap' codec can't encode characters in position 0-6: character maps to <undefined>
Finally it should be like this:
所请求的返程日
Could you suggest what I can do to see correct string?