0

Suppose I have the following string that I want to decode as utf-8:

str ='\\u00d7\\u0090\\u00d7\\u0090\\u00d7\\u0090'
# expect 'אאא'

Using python 3, I would expect the following to work, but it doesn't:

bytes(str, 'ascii').decode('unicode-escape')
# prints '×××'
bytes(str, 'ascii').decode('utf-8')
# prints '\\u00d7\\u0090\\u00d7\\u0090\\u00d7\\u0090'

Any help?

user2129817
  • 85
  • 1
  • 7

1 Answers1

1

You can do it with multiple trips through encode/decode.

print(st.encode('ascii').decode('unicode-escape').encode('iso-8859-1').decode('utf-8'))

The first is the preferred alternate to bytes. The second converts the escape sequences to their equivalent characters. The third takes advantage of Unicode being based on ISO-8859-1 for the first 256 code points to convert those characters directly back into bytes. Finally you can decode the UTF-8 string.

Community
  • 1
  • 1
Mark Ransom
  • 299,747
  • 42
  • 398
  • 622