2

I'm programming using Python 3.x. Say I have the following Unicode string:

my_string =' \xed\x95\x9c'

'\xed\x95\x9c' is actually the UTF-8 byte stream for the Korean character . What's the easiest way to convert my_string to ? my_string.decode('utf-8') doesn't work because my_string is a Unicode string, not a byte string.

Adam
  • 8,752
  • 12
  • 54
  • 96

1 Answers1

3

There are many possible encode/decode chains which lead to the desired result. Here is one:

In [257]: '\xed\x95\x9c'.encode('latin-1').decode('utf-8')
Out[257]: '한'

Here is the code I used to find this encode/decode chain.

unutbu
  • 842,883
  • 184
  • 1,785
  • 1,677