1

I'm receiving sns notifications from a topic forwarding emails in a django app. The request body json present in the Message.content contains strange utf-8 format ( ie: "=C3=A8" representing "è") and also some "= " somewhere.

I'm trying to parse it before to load:

body = request.body.decode('utf-8')

body_unicode = unicode(body)
js = json.loads(body_unicode.replace('\n', ''))

But I'm not able to. The substrings "=C3=A8" are still in the body_unicode.

A Magoon
  • 1,180
  • 2
  • 13
  • 21
  • `decode` translate bytes to unicode string. Why do you `unicode` the result of decoded string? And what 's `=C3=A8`? The url escape code of `è` is `%C3%A8`. – stamaimer Aug 11 '17 at 15:02

1 Answers1

2

These are quote printable characters and they are used in emails. What you are looking at can be converted to a normal string in python as shown below

Python 3.6.1 (default, Apr  4 2017, 09:40:21)
[GCC 4.2.1 Compatible Apple LLVM 8.1.0 (clang-802.0.38)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import quopri
>>> data = quopri.decodestring("=C3=A8")
>>> data
b'\xc3\xa8'
>>> data.decode("utf-8")
'è'
>>>

For more details refer to How to understand the equal sign '=' symbol in IMAP email text?

Tarun Lalwani
  • 142,312
  • 9
  • 204
  • 265