1

In python I'm having a unicode escape sequence that is originally a part of chinease text. I'm trying to display it properly (to convert into Unicode string). Searching SO I've tried several proposed ways but none of them work.
Here's what I got:

import re
import codecs
import urllib

ESCAPE_SEQUENCE_RE = re.compile(r'''
    ( \\U........      # 8-digit hex escapes
    | \\u....          # 4-digit hex escapes
    | \\x..            # 2-digit hex escapes
    | \\[0-7]{1,3}     # Octal escapes
    | \\N\{[^}]+\}     # Unicode characters by name
    | \\[\\'"abfnrtv]  # Single-character escapes
    )''', re.UNICODE | re.VERBOSE)

def decode_escapes(s):
    def decode_match(match):
        return codecs.decode(match.group(0), 'unicode-escape')

    return ESCAPE_SEQUENCE_RE.sub(decode_match, s)


print(decode_escapes('\u6240\u8BF7\u6C42\u7684\u8FD4\u7A0B\u65E5'))

Trying to execute this code will fail with this error:

Traceback (most recent call last):
  File "Test.py", line 21, in <module>
    print(decode_escapes('\u6240\u8BF7\u6C42\u7684\u8FD4\u7A0B\u65E5'))
  File "C:\Python27\lib\encodings\cp437.py", line 12, in encode
    return codecs.charmap_encode(input,errors,encoding_map)
UnicodeEncodeError: 'charmap' codec can't encode characters in position 0-6: character maps to <undefined>

Finally it should be like this:

所请求的返程日

Could you suggest what I can do to see correct string?

WhiteAngel
  • 2,594
  • 2
  • 21
  • 35
  • 1
    It is the **print call** that causes the error, your regex worked (although it is too broad, you should limit yourself to hex digits rather than use `.`). But why use a regex when you could just use `string.decode('unicode-escape')` on the *whole string*? – Martijn Pieters Nov 20 '14 at 15:12
  • I was trying with `string.decode` but it was providing with the same kind of error that's why I thought that this is an incorrect way of doing this. Thank you for mentioning about `print` call. I will try to avoid this ;) – WhiteAngel Nov 20 '14 at 15:16
  • @MartijnPieters, thanks a lot Martijn, it works like a charm. Good luck and have a nice end of the day) – WhiteAngel Nov 20 '14 at 15:22

0 Answers0