2

By checking the source of some emails, I found that many emails use 'Encoded Words' (RFC 2047) format to encode the filename parameter values. However, according to RFC 2047, this encoding method should not be used to header parameter values. Instead, the parameter value, such as the filename parameter in Content-Disposition header, should use the encoding method suggested by RFC 2231.

Thus, my question is why so many emails don't comply with the RFC standards. Is it a right way to encode the header parameter value with RFC 2047 format? Can all the email agents parse these emails properly?

melpomene
  • 84,125
  • 8
  • 85
  • 148
Gödel
  • 592
  • 4
  • 21

1 Answers1

3

The sad truth is that many popular email clients are in violation of pertinent RFCs.

Indeed, as you surmise, filenames in MIME body parts should use RFC2231, but many implementations out in the wild use RFC2047 or a number of other informal, ad-hoc, or at worst indeterminable filename encodings.

As for the "why", I don't really think this is answerable. Fundamentally I think we can't do better than guess it's a mistake at some level.

Common and easily identified incorrect encodings seem to work fairly transparently between popular clients; but by definition, failure to adhere to the specification removes any guarantee that the recipient can correctly guess what was intended.

For reference, here is a model message which should hopefully pass validation (-:

From: me <tripleee@example.org>
To: =?utf-8?B?G=C3=B6del?= <goedel@example.net>
Subject: File name and recipient are identical,
  but encoded differently
Mime-Version: 1.0
Content-type: application/octet-stream;
  name*=UTF-8''G%C3%B6del
Content-disposition: attachment;
  filename*=UTF-8''G%C3%B6del
Content-transfer-encoding: base64

R8O2ZGVsCg==

For the record, the Content-Type: header's name parameter is superseded by the filename parameter of the Content-Disposition: header, but many implenentations still conservatively specify both, in case some client somewhere still doesn't grok Content-Disposition:

tripleee
  • 175,061
  • 34
  • 275
  • 318
  • So sad...Actually, I constructed some emails with Python email module, which complies with RFC 2231, and sent these emails to several recipients. Some recipients told me that the Excel attachments in my emails became '.dat' files and they couldn't open the attachment files. I suspect that it is because Python email uses the RFC 2231 style encoding methods for the filename parameter values and some clients can't parse RFC 2231 encoding properly. I'm not sure about this. I didn't find any solutions to this problem on the web. – Gödel Aug 07 '18 at 06:55
  • Without a sample of those emails and information about what client they used, that's hard to analyze or validate. I would speculate that the problem lies elsewhere. – tripleee Aug 07 '18 at 06:59