9

I'm looking for a drop-in include script / class that dissects multipart/form-data and fills up $_POST(+raw) and $_FILES from it. Usually PHP does that itself. But because the automatic handling is insufficient for me and makes php://input inaccesible[1] I'll probably be using something like this to prevent that:

RewriteRule .* - [E=CONTENT_TYPE:noparsing/for-you-php]
Does not work. Actual solution requires mod_headers and RequestHeader set...

The extracting procedure might not be that complex. But I'd rather use a well-tested solution. And foremost I would prefer an implementation that uses fgets for splitting, and mimics the $_FILES handling closely and efficiently. Finding the end of binary payloads would seem rather tricky to me, in particular when you have to strip off \r\n but might encounter clients that only send \n (not allowed, but possible).

I'm certain something like this exists. But I'm having a hard time googling it. Does anyone know an implementation? (PEAR::mimeDecode can be hacked to get sort of working for form-data, but is a memory hog.)

The use case in short: need to preserve the raw field names (including whitespace and special characters), for logging, but can't avoid file uploads always.


For decorative purposes, that's how a POST request looks:

POST / HTTP/1.1
Host: localhost:8000
Content-Length: 17717
Content-Type: multipart/form-data; boundary=----------3wCuBwquE9P7A4OEylndVx

And after a \r\n\r\n sequence the multipart/ payload follows like this:

------------3wCuBwquE9P7A4OEylndVx
Content-Disposition: form-data; name="_charset_"

windows-1252
------------3wCuBwquE9P7A4OEylndVx
Content-Disposition: form-data; name=" text field \\ 1 \";inject=1"

text1 te twj sakfkl
------------3wCuBwquE9P7A4OEylndVx
Content-Disposition: form-data; name="file"; filename="dial.png"
Content-Type: image/png

IPNG Z @@@MIHDR@@B`@@B;HF@@@-'.e@@@AsRGB@.N\i@@@FbKGD@?@?@? ='S@@@     
@@@GtIMEGYAAU,#}BRU@@@YtEXtComment@Created with GIMPWANW@@ @IDATxZl]w|
Community
  • 1
  • 1
mario
  • 144,265
  • 20
  • 237
  • 291
  • This will probably become a bounty question.. – mario Apr 06 '11 at 04:43
  • Will it be guaranteed that each MIME part in the multipart will have a Content-Length? I don't remember if the spec requires this or not. I'd imagine it would. – Charles Apr 07 '11 at 01:43
  • The bummer is that the spec [RFC2388](http://www.faqs.org/rfcs/rfc2388.html) does not mention `Content-Length` at all. While I would assume most current browsers do so (and use base64 encoding at least), I'm actually trying to support the more wacky clients. (Edit: No, not even Opera does it.) – mario Apr 07 '11 at 01:52
  • Doesn't the "$argc" variable contain the raw post data? Or the empty string $_POST['']? Why do you say that automatic handling is insufficient? How does php://stdio work for you? – David d C e Freitas Apr 12 '11 at 16:03
  • @David: Neither `argc` nor `argv` are present for POST requests. And `php://stdin` and `php://input` are empty. PHP soaks up the complete POST body in `main/rfc1867.c`. That's why it is inaccessible. My issue is that leading spaces are stripped and many ASCII characters converted into `_` underscores. – mario Apr 12 '11 at 16:13

3 Answers3

4

It's late and I can't test this at the moment but the following should do what you want:

//$boundary = null;

if (is_resource($input = fopen('php://input', 'rb')) === true)
{

    while ((feof($input) !== true) && (($line = fgets($input)) !== false))
    {
        if (isset($boundary) === true)
        {
            $content = null;

            while ((feof($input) !== true) && (($line = fgets($input)) !== false))
            {
                $line = trim($line);

                if (strlen($line) > 0)
                {
                    $content .= $line . ' ';
                }

                else if (empty($line) === true)
                {
                    if (stripos($content, 'name=') !== false)
                    {
                        $name = trim(stripcslashes(preg_replace('~.*name="?(.+)"?.*~i', '$1', $content)));

                        if (stripos($content, 'Content-Type:') !== false)
                        {
                            $tmpname = tempnam(sys_get_temp_dir(), '');

                            if (is_resource($temp = fopen($tmpname, 'wb')) === true)
                            {
                                while ((feof($input) !== true) && (($line = fgets($input)) !== false) && (strpos($line, $boundary) !== 0))
                                {
                                    fwrite($temp, preg_replace('~(?:\r\n|\n)$~', '', $line));
                                }

                                fclose($temp);
                            }

                            $FILES[$name] = array
                            (
                                'name' => trim(stripcslashes(preg_replace('~.*filename="?(.+)"?.*~i', '$1', $content))),
                                'type' => trim(preg_replace('~.*Content-Type: ([^\s]*).*~i', '$1', $content)),
                                'size' => sprintf('%u', filesize($tmpname)),
                                'tmp_name' => $tmpname,
                                'error' => UPLOAD_ERR_OK,
                            );
                        }

                        else
                        {
                            $result = null;

                            while ((feof($input) !== true) && (($line = fgets($input)) !== false) && (strpos($line, $boundary) !== 0))
                            {
                                $result .= preg_replace('~(?:\r\n|\n)$~', '', $line);
                            }

                            if (array_key_exists($name, $POST) === true)
                            {
                                if (is_array($POST[$name]) === true)
                                {
                                    $POST[$name][] = $result;
                                }

                                else
                                {
                                    $POST[$name] = array($POST[$name], $result);
                                }
                            }

                            else
                            {
                                $POST[$name] = $result;
                            }
                        }
                    }

                    if (strpos($line, $boundary) === 0)
                    {
                        //break;
                    }
                }
            }
        }

        else if ((is_null($boundary) === true) && (strpos($line, 'boundary=') !== false))
        {
            $boundary = "--" . trim(preg_replace('~.*boundary="?(.+)"?.*~i', '$1', $line));
        }
    }

    fclose($input);
}

echo '<pre>';
print_r($POST);
echo '</pre>';

echo '<hr />';

echo '<pre>';
print_r($FILES);
echo '</pre>';
mario
  • 144,265
  • 20
  • 237
  • 291
Alix Axel
  • 151,645
  • 95
  • 393
  • 500
  • Couldn't test it either yet. Looks workable though. -- I've forgotten a few issues for the task, nameley duplicate `name[]`, `name[]`, `name[]` request vars. And my POST depiction was misleading, the `boundary=` is never part of the body. But at least the `fgets` approach looks doable this way! – mario Apr 09 '11 at 17:11
  • @mario: Fixed a typo in the regex and added duplicated key support for $POST request variables. Could you clarify what do you mean when you say the boundary is never part of the body? – Alix Axel Apr 09 '11 at 17:47
  • Thanks! I fixed my question example. The `;boundary=` appears in `$_SERVER["CONTENT_TYPE"]` only. The ://input body really starts at the first `------whatever`. But I've already adapted that with a small preg_match() beforehand; which I think works well since your code foresighted tests $boundary with isset(). I'm taking a bit time with a thourough test.. But again, I think this looks ok, and I can adapt it to my other weird needs. – mario Apr 09 '11 at 18:32
  • 1
    @mario: Oh, I see. I forgot to delete a `while` in my previous edit, fixed it now. As for the duplicate keys in the `$FILES` I would just assume `$FILES[$name][] = array()` everytime, instead of just `$FILES[$name] = array()`. I actually find it easier to transverse the $_FILES superglobal that way (see https://github.com/alixaxel/phunction/blob/8c8a4c94432550cfd7b5e1d03bfd6b45e101e45d/_.php#L1308). – Alix Axel Apr 09 '11 at 18:44
  • Agreed. Having it use the indexed arrays is easier than having to discern the normal and nested structuring all the time. – mario Apr 09 '11 at 18:50
  • 1
    Still a few bugs. But I only accept the blame for one of them! :} The $boundary actually uses two more `--` dashes in the actual body. And the `\r\n` stripping should only occur at the end of each variable part/file. Also had to disable the `break;` as that skipped over sections, not sure what it's purpose is. But regardless of that, the fgets approach seems efficient and doable after all. – mario Apr 12 '11 at 22:02
  • @mario: Great! Sorry about the `break`, I was in doubt but hadn't the time to test if it was working as I though. – Alix Axel Apr 12 '11 at 23:05
3

Maybe a new php.ini directive enable_post_data_reading could help, but it seems it was added in PHP 5.4, I still have the earlier version so could not test it :(

From PHP Manual:

enable_post_data_reading boolean

Disabling this option causes $_POST and $_FILES not to be populated. The only way to read postdata will then be through the php://input stream wrapper. This can be useful to proxy requests or to process the POST data in a memory efficient fashion.

Snifff
  • 1,784
  • 2
  • 16
  • 28
0

Reading the comments, how about encoding the data before it is POSTed instead? Get the client to send the POST data in UTF8 or even URLencoded, then the ASCII characters that got lost will be transmitted without writing your own POST handler, which could well introduce its own bugs...

boisvert
  • 3,679
  • 2
  • 27
  • 53
  • 1
    Nah, can't do that. That's my weird requirement here. I need to **intercept** an ordinary POST request. I have no influence over the clients, and need to support standard forms. I can escape the whole issue by using an `application/x-www-urlencoded` POST instead of `multipart/form-data`. But that defeats the purpose and makes file uploads impossible. I'll have to go with the workaround and all its potential problems. – mario Apr 13 '11 at 08:56