25

I have an email subject of the form:

=?utf-8?B?T3.....?=

The body of the email is utf-8 base64 encoded - and has decoded fine. I am current using Perl's Email::MIME module to decode the email.

What is the meaning of the =?utf-8 delimiter and how do I extract information from this string?

Jonathan S.
  • 5,837
  • 8
  • 44
  • 63
CoffeeMonster
  • 2,160
  • 4
  • 20
  • 34

5 Answers5

40

The encoded-word tokens (as per RFC 2047) can occur in values of some headers. They are parsed as follows:

=?<charset>?<encoding>?<data>?=

Charset is UTF-8 in this case, the encoding is B which means base64 (the other option is Q which means Quoted Printable).

To read it, first decode the base64, then treat it as UTF-8 characters.

Also read the various Internet Mail RFCs for more detail, mainly RFC 2047.

Since you are using Perl, Encode::MIME::Header could be of use:

SYNOPSIS

use Encode qw/encode decode/;
$utf8   = decode('MIME-Header', $header);
$header = encode('MIME-Header', $utf8);

ABSTRACT

This module implements RFC 2047 Mime Header Encoding. There are 3 variant encoding names; MIME-Header, MIME-B and MIME-Q. The difference is described below

              decode()          encode()  
MIME-Header   Both B and Q      =?UTF-8?B?....?=  
MIME-B        B only; Q croaks  =?UTF-8?B?....?=  
MIME-Q        Q only; B croaks  =?UTF-8?Q?....?=
Palec
  • 12,743
  • 8
  • 69
  • 138
1800 INFORMATION
  • 131,367
  • 29
  • 160
  • 239
17

I think that the Encode module handles that with the MIME-Header encoding, so try this:

use Encode qw(decode);
my $decoded = decode("MIME-Header", $encoded);
moritz
  • 12,710
  • 1
  • 41
  • 63
  • 1
    That was helpful, thanks. Btw, I also used print encode('utf-8', $headers_decoded) to display decoded headers properly, if someone else is reading this while writing some mail script. – kagali-san Oct 25 '10 at 18:30
3

Check out RFC2047. The 'B' means that the part between the last two '?'s is base64-encoded. The 'utf-8' naturally means that the decoded data should be interpreted as UTF-8.

marijne
  • 2,992
  • 5
  • 22
  • 21
2

MIME::Words from MIME-tools work well too for this. I ran into some issue with Encode and found MIME::Words succeeded on some strings where Encode did not.

use MIME::Words qw(:all);
$decoded = decode_mimewords(
    'To: =?ISO-8859-1?Q?Keld_J=F8rn_Simonsen?= <keld@dkuug.dk>',
);
Palec
  • 12,743
  • 8
  • 69
  • 138
Philonious
  • 21
  • 1
1

This is a standard extension for charset labeling of headers, specified in RFC2047.

wnoise
  • 9,764
  • 37
  • 47