3

How can I encode the value of a filename according to the encoding of MIME Parameter Value and Encoded Word Extensions: Character Sets, Languages, and Continuations (RFC 2231)?

Gumbo
  • 643,351
  • 109
  • 780
  • 844
Juanjo Conti
  • 28,823
  • 42
  • 111
  • 133

1 Answers1

2

I think this should do it:

function rfc2231_encode($name, $value, $charset='', $lang='', $ll=78) {
    if (strlen($name) === 0 || preg_match('/[\x00-\x20*\'%()<>@,;:\\\\"\/[\]?=\x80-\xFF]/', $name)) {
        // invalid parameter name;
        return false;
    }
    if (strlen($charset) !== 0 && !preg_match('/^[A-Za-z]{1,8}(?:-[A-Za-z]{1,8})*$/', $charset)) {
        // invalid charset;
        return false;
    }
    if (strlen($lang) !== 0 && !preg_match('/^[A-Za-z]{1,8}(?:-[A-Za-z]{1,8})*$/', $lang)) {
        // invalid language;
        return false;
    }
    $value = "$charset'$lang'".preg_replace_callback('/[\x00-\x20*\'%()<>@,;:\\\\"\/[\]?=\x80-\xFF]/', function($match) { return rawurlencode($match[0]); }, $value);
    $nlen = strlen($name);
    $vlen = strlen($value);
    if (strlen($name) + $vlen > $ll-3) {
        $sections = array();
        $section = 0;
        for ($i=0, $j=0; $i<$vlen; $i+=$j) {
            $j = $ll - $nlen - strlen($section) - 4;
            $sections[$section++] = substr($value, $i, $j);
        }
        for ($i=0, $n=$section; $i<$n; $i++) {
            $sections[$i] = " $name*$i*=".$sections[$i];
        }
        return implode(";\r\n", $sections);
    } else {
        return " $name*=$value";
    }
}

Note that this function expects that the output is used in a separate line preceded by a proper line wrap (i.e. CRLF), e.g.:

"Content-Type: application/x-stuff;\r\n".rfc2231_encode('title', 'This is even more ***fun*** isn\'t it!', 'us-ascii', 'en', 48)

The output is:

Content-Type: application/x-stuff;
 title*0*=us-ascii'en'This%20is%20even%20more%20;
 title*1=%2A%2A%2Afun%2A%2A%2A%20isn%27t%20it!

See also Test Cases for HTTP Content-Disposition header field and the Encodings defined in RFC 2047 and RFC 2231/5987.

Gumbo
  • 643,351
  • 109
  • 780
  • 844
  • Could you provide an example of use? – Juanjo Conti Feb 11 '11 at 12:22
  • I just wanted something that receive 1 parameter: rfc_2231_encode($filename), but seems that the attribute name length(in this case count('filename') is needed. Is there a max numbers of chars for $name + $value?? I ask for that $ll=78. what's $ll for? – Juanjo Conti Feb 11 '11 at 12:43
  • @Juanjo Conti: The minimum parameters are *name* and *value*, so: `rfc2231_encode('filename', $filename)`. `$ll` is just the maximum line length. – Gumbo Feb 11 '11 at 12:55
  • 1
    Note: limiting the line length isn't needed for HTTP. Question: the charset doesn't seem to be used for actually mapping from characters to octets; am I missing something (me not an PHP programmer). – Julian Reschke Feb 11 '11 at 17:57
  • @Julian Reschke: No, the *charset* is only used for proper declaration. I assume that *value* is already properly encoded using the encoding specified in *charset*. – Gumbo Feb 11 '11 at 19:41
  • @Julian Reschke: And if HTTP really doesn’t require a line limit (are the, then the function will get quite more comprehensive. – Gumbo Feb 11 '11 at 19:43
  • 1
    Nice post, but your output is invalid! The title*1= content is encoded, ergo it must be title*1*=. Notice the extra * which tells the decoder it is encoded. In the RFC example, title*1 is not encoded and therefore does not have an extra *. Also your output is missing a semicolon after title*0*=... It is not in the RFC example, but if you check the [errata](http://www.rfc-editor.org/errata_search.php?rfc=2231&eid=590), you will see that it is fixed there. Please fix your bugs, as your output is invalid and a decoder expecting correct encoded emails might not be able to decode correctly. – foens Feb 12 '11 at 09:47
  • What about if I want to use it withe the "X-Sendfile" header which dont use a "filename=" ? Could you provide an implementation without the $name argument? – Juanjo Conti Feb 14 '11 at 14:22
  • @Juanjo Conti: What is *X-Sendfile* and where is it specified? – Gumbo Feb 14 '11 at 14:34