Unicode string to Unicode character, Python 3

Question

I'm programming using Python 3.x. Say I have the following Unicode string:

my_string =' \xed\x95\x9c'

'\xed\x95\x9c' is actually the UTF-8 byte stream for the Korean character 한. What's the easiest way to convert my_string to 한? my_string.decode('utf-8') doesn't work because my_string is a Unicode string, not a byte string.

score 3 · Accepted Answer · answered Jun 16 '17 at 23:33

3

There are many possible encode/decode chains which lead to the desired result. Here is one:

In [257]: '\xed\x95\x9c'.encode('latin-1').decode('utf-8')
Out[257]: '한'

Here is the code I used to find this encode/decode chain.

answered Jun 16 '17 at 23:33

unutbu

842,883
184
1,785
1,677

Unicode string to Unicode character, Python 3

1 Answers1

Linked