How convert email subject from "?UTF-8?...?=" to readable string?

Question

Possible Duplicate:
string encode / decode

Now the subject looks like: =?UTF-8?B?0J/RgNC+0LLQtdGA0LrQsA==?=

Out of close votes. http://stackoverflow.com/questions/4896194/string-encode-decode — Ignacio Vazquez-Abrams, Mar 10 '11 at 12:30

score 11 · Answer 1 · answered Mar 10 '11 at 12:29

11

Maybe you can use decode_header function: http://docs.python.org/library/email.header.html#email.header.decode_header

answered Mar 10 '11 at 12:29

gruszczy

40,948
31
128
181

Thanks for replying, but result is not good: [('\xd0\x9f\xd1\x80\xd0\xbe\xd0\xb2\xd0\xb5\xd1\x80\xd0\xba\xd0\xb0', 'utf-8')] – anton Mar 10 '11 at 12:37
1

You can convert that result into a unicode string, by using `unicode(*result[0])`. – gnud Mar 10 '11 at 13:00

score 11 · Accepted Answer · answered Mar 10 '11 at 12:56

11

The part between =?UTF-8?B? and ?= is a base64-encoded string. Extract that part, and then decode it.

import base64

#My buggy SSH account needs this to write unicode output, you hopefully won't
import sys
import codecs
sys.stdout = codecs.getwriter('utf-8')(sys.stdout)


encoded = '=?UTF-8?B?0J/RgNC+0LLQtdGA0LrQsA==?='
prefix = '=?UTF-8?B?'
suffix = '?='

#extract the data part of the string
middle = encoded[len(prefix):len(encoded)-len(suffix)]
print "Middle: %s" % middle

#decode the bytes
decoded = base64.b64decode(middle)
#decode the utf-8
decoded = unicode(decoded, 'utf8')

print "Decoded: %s" % decoded

Output:

Middle: 0J/RgNC+0LLQtdGA0LrQsA==
Decoded: Проверка

answered Mar 10 '11 at 12:56

gnud

77,584
5
64
78

So much work to replace 2 lines of correct code... – Ignacio Vazquez-Abrams Mar 10 '11 at 13:04
1

Yes, using `email.header.decode_header` seem like a better start, instead of my substring mess. I still explained what was going on though, and how to convert the result from decode_header to a unicode string. – gnud Mar 10 '11 at 13:14
What standard would this UTF-8 subjects be based on? – Gert van den Berg May 19 '16 at 07:38

How convert email subject from "?UTF-8?...?=" to readable string?

2 Answers2

Linked