4

how can I decode a string containing stuff like this:

staße

to

staße

using python.

(EDIT: Interpreting the source as html entities does not lead to the desired result, but "staÃe")

Background: I am struggling to work with the amazon mws response-strings using the mws client you get when doing pip install mws. Especially wondering because the sourcestring looks like it contains 2 special characters, but the goal is just 'ß'.

In the docs they are talking about a Unicode character limit i did not understand

Telcrome
  • 361
  • 2
  • 14

1 Answers1

3

Well, the problem here is that ß is represented in UTF-8 as the sequence of two bytes: C3 9F hex or 195 159 decimal. However, as you decode your entities as HTML, they end up as Unicode code points 195 and 159, 195 being code point for Ã. You will have to do some voodoo, like casting the str to bytes and then decoding bytes to (Unicode) str. Compare the results of:

print('\xc3\x9f')

print(bytes('\xc3\x9f', 'Latin-1').decode())
Błotosmętek
  • 12,717
  • 19
  • 29
  • thanks, your snippet in combination with setting utf-8 as encoding on the output xml file solved the problem – Telcrome Sep 26 '17 at 12:48