is there any Python module which helps to decode the various forms of encoded mail headers, mainly Subject, to simple - say - UTF-8 strings?
Here are example Subject headers from mail files that I have:
Subject: [ 201105311136 ]=?UTF-8?B?IMKnIDE2NSBBYnM=?=. 1 AO;
Subject: [ 201105161048 ] GewSt:=?UTF-8?B?IFdlZ2ZhbGwgZGVyIFZvcmzDpHVmaWdrZWl0?=
Subject: [ 201105191633 ]
=?UTF-8?B?IERyZWltb25hdHNmcmlzdCBmw7xyIFZlcnBmbGVndW5nc21laHJhdWZ3ZW5kdW4=?=
=?UTF-8?B?Z2VuIGVpbmVzIFNlZW1hbm5z?=
text - encoded sting - text
text - encoded string
text - encoded string - encoded string
Encodig could also be something else like ISO 8859-15.
Update 1: I forgot to mention, I tried email.header.decode_header
for item in message.items():
if item[0] == 'Subject':
sub = email.header.decode_header(item[1])
logging.debug( 'Subject is %s' % sub )
This outputs
> DEBUG:root:Subject is [('[ 201101251025 ]
> ELStAM;=?UTF-8?B?IFZlcmbDvGd1bmcgdm9tIDIx?=. Januar 2011', None)]
which does not really help.
Update 2: Thanks to Ingmar Hupp in the comments.
the first example decodes to a list of two tupels:
> >>> print decode_header("""[ 201105161048 ]
> GewSt:=?UTF-8?B?IFdlZ2ZhbGwgZGVyIFZvcmzDpHVmaWdrZWl0?=""")
> [('[ 201105161048 ] GewSt:', None), (' Wegfall der Vorl\xc3\xa4ufigkeit',
> 'utf-8')]
is this always [(string, encoding),(string, encoding), ...] so I need a loop to concat all the [0] items to one string or how to get it all in one string?
> Subject: [ 201101251025 ] ELStAM;=?UTF-8?B?IFZlcmbDvGd1bmcgdm9tIDIx?=. Januar 2011
does not decode well:
> print decode_header("""[ 201101251025 ] ELStAM;=?UTF-8?B?IFZlcmbDvGd1bmcgdm9tIDIx?=. Januar 2011""")
>
>[('[ 201101251025 ] ELStAM;=?UTF-8?B?IFZlcmbDvGd1bmcgdm9tIDIx?=. Januar 2011', None)]