I have some problems with strange escaped unicode Strings. My script consumes a webservice via the request library and response.text contains the following unicode string:
u'\\u003c? abc ?\\u003eDas Modell des Adaptiven Zyklus wurde aus vergleichenden Untersuchungen zur Dynamik von \xd6kosystemen abgeleitet.\\u003c? /abc ?\\u003e'
**Updated** Martijn solution works with the upper one, but breaks with this one because of len="12"
u'\\u003c?abc len="12"?\\u003eResilienz sollte als st\xe4ndiger Anpassungsprozess zwischen Systemen und der Umwelt begriffen werden.\\u003c? /abc ?\\u003e'
The response from the server looks something like this:
\u003c? abc ?\u003eDas Modell des Adaptiven Zyklus wurde aus vergleichenden Untersuchungen zur Dynamik von Ökosystemen abgeleitet.\u003c?dpf /sent ?\u003e
The problem are the double escaped unicode sequences like \u003c, \u003c normally represents a < char. \xd6 is correct and represents a german Ö. This double escaping totally messes up my unicode string :-)
I have found a similar problem at this post: Stack Overflow - Conversion of strings like \uXXXX in python
The solution, using string.decode('unicode-escape'), only seems to work if all unicode sequences would be escaped but not with mixed single and double escapes. Just replacing the double escapes with single ones gives me a corrupt unicode string.
The easiest and best solution would be to adjust the response encoding on the server side, but i have no access ...
Thank's for your help!!!