18

I'm trying to implement a basic MIME parser for the multipart/related in C++/Qt.

So far I've been writing some basic parser code for headers, and I'm reading the RFCs to get an idea how to do everything as close to the specification as possible. Unfortunately there is a part in the RFC that confuses me a bit:

From RFC882 Section 3.1.1:

Each header field can be viewed as a single, logical line of ASCII characters, comprising a field-name and a field-body. For convenience, the field-body portion of this conceptual entity can be split into a multiple-line representation; this is called "folding". The general rule is that wherever there may be linear-white-space (NOT simply LWSP-chars), a CRLF immediately followed by AT LEAST one LWSP-char may instead be inserted. Thus, the single line

Alright, so I simply parse a header field and if a CRLF follows with linear whitespace, I simply concat those in a useful manner to result in a single header line. Let's proceed...

From RFC2045 Section 5.1:

In the Augmented BNF notation of RFC 822, a Content-Type header field value is defined as follows:

 content := "Content-Type" ":" type "/" subtype
            *(";" parameter)
            ; Matching of media type and subtype
            ; is ALWAYS case-insensitive.

[...]

 parameter := attribute "=" value
 attribute := token
              ; Matching of attributes
              ; is ALWAYS case-insensitive.
 value := token / quoted-string
 token := 1*<any (US-ASCII) CHAR except SPACE, CTLs,
             or tspecials>

Okay. So it seems if you want to specify a Content-Type header with parameters, simply do it like this:

Content-Type: multipart/related; foo=bar; something=else

... and a folded version of the same header would look like this:

Content-Type: multipart/related;
    foo=bar;
    something=else

Correct? Good. As I kept reading the RFCs, I came across the following in RFC2387 Section 5.1 (Examples):

 Content-Type: Multipart/Related; boundary=example-1
         start="<950120.aaCC@XIson.com>";
         type="Application/X-FixedRecord"
         start-info="-o ps"

 --example-1
 Content-Type: Application/X-FixedRecord
 Content-ID: <950120.aaCC@XIson.com>

 [data]
 --example-1
 Content-Type: Application/octet-stream
 Content-Description: The fixed length records
 Content-Transfer-Encoding: base64
 Content-ID: <950120.aaCB@XIson.com>

 [data]

 --example-1--

Hmm, this is odd. Do you see the Content-Type header? It has a number of parameters, but not all have a ";" as parameter delimiter.

Maybe I just didn't read the RFCs correctly, but if my parser works strictly like the specification defines, the type and start-info parameters would result in a single string or worse, a parser error.

Guys, what's your thought on this? Just a typo in the RFCs? Or did I miss something?

Thanks!

Community
  • 1
  • 1
BastiBen
  • 19,679
  • 11
  • 56
  • 86
  • When working with such standards, you should always be tolerant when reading the input and strict when writing the output. – Gumbo Jun 16 '10 at 07:17
  • 1
    It is a typo in the examples. Parameters must always be delimited with semicolons correctly, even when folded. The folding is not meant to change the semantics of a header, only to allow for readability and to account for systems that have line length restrictions. – Remy Lebeau Aug 24 '10 at 18:28
  • 1
    @Remy Lebeau: Why don't you post it as answer so I can accept it? I tried to contact the original author of the RFC, but they didn't respond so far. – BastiBen Aug 25 '10 at 16:59
  • 1
    Great question, I had the same "*Wait, what?*" experience reading through 1521 and 2045. – Dan Lugg Jun 14 '13 at 13:04

2 Answers2

17

It is a typo in the examples. Parameters must always be delimited with semicolons correctly, even when folded. The folding is not meant to change the semantics of a header, only to allow for readability and to account for systems that have line length restrictions.

Remy Lebeau
  • 555,201
  • 31
  • 458
  • 770
  • 2
    Absolutely a typo. Here's the BNF: `content := "Content-Type" ":" type "/" subtype *(";" parameter)`. http://tools.ietf.org/html/rfc2045#section-5.1. – james.garriss Jun 12 '13 at 17:30
1

Quite possibly a typo, but in general (and from experience) you should be able to handle this kind of thing "in the wild" as well. In particular, mail clients vary wildly in their ability to generate valid messages and follow all of the relevant specifications (if anything, it's even worse in the email/SMTP world than it is the WWW world!)

Dean Harding
  • 71,468
  • 13
  • 145
  • 180
  • I only have a handle MIME data from a handful of systems and most of them produce valid MIME structures. But I'm considering to release my MIME parser under the GPL or BSD license so everyone else can use it, too. – BastiBen Jun 16 '10 at 06:28