decode utf8 mail header

Question

In my MUA (Thunderbird 15.0.1) both mail subjects are displayed like this:

Keine Mail zu "Abschlagsänderung" gefunden

Here is a snippet to reproduce it:

import email

for subject in ['Subject: Re: Keine Mail zu "=?utf-8?q?Abschlags=C3=A4nderung?=" gefunden',
                'Subject: =?utf-8?q?Keine_Mail_zu_=22Abschlags=C3=A4nderung=22_gefunden?=']:
    msg=email.message_from_string(subject)
    print email.Header.decode_header(msg.get('subject'))

Output:

[('Re: Keine Mail zu "=?utf-8?q?Abschlags=C3=A4nderung?=" gefunden', None)]
[('Keine Mail zu "Abschlags\xc3\xa4nderung" gefunden', 'utf-8')]

The first header can't be parsed by python, but thunderbird does. It was created by KMail/1.11.4

How can I parse the first header with umlauts in Python 2.7?

Related: [email header decoding UTF-8](http://stackoverflow.com/questions/7331351/python-email-header-decoding-utf-8) — Ivan Chau, Sep 13 '15 at 07:29

score 2 · Accepted Answer · edited Oct 07 '21 at 10:55

2

According to RFC 2047,

An 'encoded-word' MUST NOT appear within a 'quoted-string'.

A 'quoted-string' according to RFC 822 is

quoted-string = <"> *(qtext/quoted-pair) <">; Regular qtext or quoted chars.

So I think the Python library is right, as

"=?utf-8?q?Abschlags=C3=A4nderung?="

is a quoted string. A better alternative with minimal quoting would be

=?utf-8?q?=22Abschlags=C3=A4nderung=22?=

having the " encoded as =22.

You could parse them by replacing the " with =?utf-8?q?=22?=:

>>> email.Header.decode_header('=?utf-8?q?=22?= =?utf-8?q?Abschlags=C3=A4nderung?= =?utf-8?q?=22?=')
[('"Abschlags\xc3\xa4nderung"', 'utf-8')]

edited Oct 07 '21 at 10:55

Community

1
1

answered Oct 17 '12 at 14:06

glglgl

89,107
13
149
217

Thank you very much for this answer. Since it is a bug in KMail, and this MUA is not very wide spread, I will leave my code like it is. – guettli Oct 17 '12 at 19:10
I came across this bug in KMail again. The bug in KMail is still open and several years old: https://bugs.kde.org/show_bug.cgi?id=69007 – guettli May 14 '13 at 13:17

decode utf8 mail header

1 Answers1

Linked