How can I encode the value of a filename according to the encoding of MIME Parameter Value and Encoded Word Extensions: Character Sets, Languages, and Continuations (RFC 2231)?
Asked
Active
Viewed 2,031 times
1 Answers
2
I think this should do it:
function rfc2231_encode($name, $value, $charset='', $lang='', $ll=78) {
if (strlen($name) === 0 || preg_match('/[\x00-\x20*\'%()<>@,;:\\\\"\/[\]?=\x80-\xFF]/', $name)) {
// invalid parameter name;
return false;
}
if (strlen($charset) !== 0 && !preg_match('/^[A-Za-z]{1,8}(?:-[A-Za-z]{1,8})*$/', $charset)) {
// invalid charset;
return false;
}
if (strlen($lang) !== 0 && !preg_match('/^[A-Za-z]{1,8}(?:-[A-Za-z]{1,8})*$/', $lang)) {
// invalid language;
return false;
}
$value = "$charset'$lang'".preg_replace_callback('/[\x00-\x20*\'%()<>@,;:\\\\"\/[\]?=\x80-\xFF]/', function($match) { return rawurlencode($match[0]); }, $value);
$nlen = strlen($name);
$vlen = strlen($value);
if (strlen($name) + $vlen > $ll-3) {
$sections = array();
$section = 0;
for ($i=0, $j=0; $i<$vlen; $i+=$j) {
$j = $ll - $nlen - strlen($section) - 4;
$sections[$section++] = substr($value, $i, $j);
}
for ($i=0, $n=$section; $i<$n; $i++) {
$sections[$i] = " $name*$i*=".$sections[$i];
}
return implode(";\r\n", $sections);
} else {
return " $name*=$value";
}
}
Note that this function expects that the output is used in a separate line preceded by a proper line wrap (i.e. CRLF), e.g.:
"Content-Type: application/x-stuff;\r\n".rfc2231_encode('title', 'This is even more ***fun*** isn\'t it!', 'us-ascii', 'en', 48)
The output is:
Content-Type: application/x-stuff;
title*0*=us-ascii'en'This%20is%20even%20more%20;
title*1=%2A%2A%2Afun%2A%2A%2A%20isn%27t%20it!

Gumbo
- 643,351
- 109
- 780
- 844
-
-
I just wanted something that receive 1 parameter: rfc_2231_encode($filename), but seems that the attribute name length(in this case count('filename') is needed. Is there a max numbers of chars for $name + $value?? I ask for that $ll=78. what's $ll for? – Juanjo Conti Feb 11 '11 at 12:43
-
@Juanjo Conti: The minimum parameters are *name* and *value*, so: `rfc2231_encode('filename', $filename)`. `$ll` is just the maximum line length. – Gumbo Feb 11 '11 at 12:55
-
1Note: limiting the line length isn't needed for HTTP. Question: the charset doesn't seem to be used for actually mapping from characters to octets; am I missing something (me not an PHP programmer). – Julian Reschke Feb 11 '11 at 17:57
-
@Julian Reschke: No, the *charset* is only used for proper declaration. I assume that *value* is already properly encoded using the encoding specified in *charset*. – Gumbo Feb 11 '11 at 19:41
-
@Julian Reschke: And if HTTP really doesn’t require a line limit (are the, then the function will get quite more comprehensive. – Gumbo Feb 11 '11 at 19:43
-
1Nice post, but your output is invalid! The title*1= content is encoded, ergo it must be title*1*=. Notice the extra * which tells the decoder it is encoded. In the RFC example, title*1 is not encoded and therefore does not have an extra *. Also your output is missing a semicolon after title*0*=... It is not in the RFC example, but if you check the [errata](http://www.rfc-editor.org/errata_search.php?rfc=2231&eid=590), you will see that it is fixed there. Please fix your bugs, as your output is invalid and a decoder expecting correct encoded emails might not be able to decode correctly. – foens Feb 12 '11 at 09:47
-
What about if I want to use it withe the "X-Sendfile" header which dont use a "filename=" ? Could you provide an implementation without the $name argument? – Juanjo Conti Feb 14 '11 at 14:22
-