3

Many users of my site have reported problems downloading a large file (80 MB). I am using a forced download using headers. I can provide additional php settings if necessary. I am using the CakePHP framework, but this code is all regular php. I am using php 5.2 with apache on a dedicated virtual server from media temple, CentOS Linux. Do you see any problems with the following code:

        set_time_limit(1500);
        header("Content-Type: application/octet-stream");
        header("Content-Disposition: attachment; filename=\"" . basename($file_path) . "\"");
        header("Content-Length: ".$content_length);
        header("Content-Transfer-Encoding: binary");
        header('Cache-Control: must-revalidate, post-check=0, pre-check=0');
        header('Cache-Control: private', false);
        header('Pragma: public');
        header('Expires: 0');

        //Change this part
        $handle = fopen($file_path, 'rb');
        while (!feof($handle))
        {
            echo fread($handle, 4096);
            ob_flush();
            flush();
        }
        fclose($handle);
        exit;

Basically, the problem being reported is that the download starts and then stops in the middle. I was thinking it was a problem with the time limit, so I add the set_time_limit code. I was using the php readfile function before, but that also did not work smoothly.

animuson
  • 53,861
  • 28
  • 137
  • 147
jimiyash
  • 2,494
  • 2
  • 20
  • 29
  • What about `set_time_limit(0)` ? – alex Dec 13 '10 at 01:48
  • What's the point of the loop to output the file, if you don't mind me asking? – El Yobo Dec 13 '10 at 01:49
  • @ElYobo My guess so it doesn't consume too much memory at once. – alex Dec 13 '10 at 01:51
  • I would assume the same. You don't know how big the files can potentially get. That being said, this gives more flexibility (and a bit more robust) than a file_get_contents (as you'd typically see). – Brad Christie Dec 13 '10 at 01:52
  • 1
    But something like `readfile` would avoid putting in to memory at all (unless output buffering is enabled - but that will have the same problem even if you read it chunk by chunk). – El Yobo Dec 13 '10 at 02:01
  • 1
    I'd be inclined to avoid using PHP to echo large file data and instead use it to manage the creation/deletion of randomly named symlinks to a "hidden" storage path, Unless you need security of course :) – Scuzzy Dec 13 '10 at 02:05
  • @Scuzzy so instead of a forced download, I should use a sym link to somewhere in a web accessible directory and they just click and download and then delete it after a certain amount of time? Would that take the load off the webserver then? – jimiyash Dec 13 '10 at 02:11
  • @Brad, also, could you clarify the flexibility/robustness improvements? I ask because I currently use readfile in our system and haven't had any problems so far; most files are not very large, however, so I'm trying to see whether this approach offers any advantages over what I'm currently doing. – El Yobo Dec 13 '10 at 02:24
  • @ElYobo, I don't see any issue with either. The going trend with those rendering files through PHP for direct-download is using readfile (as mentioned, it's a direct dump to the output buffer). Both are saving PHP from loading the file completely and spitting it off in segments (just one seems to handle the file within the engine and the other is coder-managed). My personal opinion, use either. I may be 100% wrong, but I've never had trouble with either (other than carpel tunnel with with fopen-ing it myself. ;-) – Brad Christie Dec 13 '10 at 02:42
  • :D My concern with the `fread` approach is that the entire file is (admittedly in chunks) read into memory at some point in time; PHP's memory handling (esp. in some 5.2 versions) is terrible, I find, so I'm not confident that each chunk will *really* be freed within the loop... But the example above is more or less direct from the comments on the readfile documentation, so presumably it's there to resolve a problem users are experiencing with readfile. – El Yobo Dec 13 '10 at 02:47
  • @jimiyash yes, webservers are great at serving static content. – Scuzzy Dec 13 '10 at 06:56

2 Answers2

4

The problem with PHP-initiated http transfers is that they seldomly support partial requests:

GET /yourfile HTTP/1.1
Range: bytes=31489531-79837582

Whenever a browser encounters a transmission problem, it will try to resume the download. Your php script does not accomodate for that (it's not trivial, so nobody does).

So really avoid that. Redirect users to a static file and let your webserver handle it. If you need to handle authorization, use tricks like symlinks or rewriterules that check for session cookies or even a static permission file (./allowed/178.224.2.55-file-1). Any required extra HTTP headers can be injected likewise, or with a .meta file.

mario
  • 144,265
  • 20
  • 237
  • 291
  • Interesting idea; do you know of a concrete example of this somewhere? – El Yobo Dec 13 '10 at 02:50
  • 1
    @ElYobo: For the .htaccess permission trick a simple `RewriteCond -f ./allow-%{REMOTE_ADDR}` might suffice. Byte-Range support is in Nanoweb and PEAR HTTP_Server IIRC. But a quick google gives: http://www.coneural.org/florian/papers/04_byteserving.php – mario Dec 13 '10 at 02:54
  • Cool, thanks. I'd need more security than the RewriteCond example (e.g. multiple users behind a proxy), but the paper is interesting. – El Yobo Dec 13 '10 at 02:58
  • @ElYobo: Yes, that's really only workable for the simplest of cases. But it might be possible to use a RewriteCond on `%{HTTP_COOKIE}` and check against a session-stampfile. But never tried that :] – mario Dec 13 '10 at 03:01
  • Do you think it would be the same thing if I just put the static file somewhere that can be read and just limit access by IP address in the .htaccess file for the file's directory? I could probably use PHP to write to a whitelist with Allow from 100.100.100.100 and just keep appending to it. – jimiyash Dec 13 '10 at 03:58
  • @jimiyash: I'm not sure if appending to .htaccess is reliable. At some point it grows too big, you have to reset it, and this might lead to race conditions. Ideally hard to guess and random symlink filenames should suffice. But you could also create temporary per-user directories with individual .htaccess whitelists. – mario Dec 13 '10 at 04:30
  • @mario: I was thinking of deleting the whitelists daily during an off-peak time and regenerating them with php if they don't exist. Maybe if I used the sleep function, that would eliminate any race conditions. – jimiyash Dec 13 '10 at 04:52
  • @jimiyash: You'll have to give it a try. All depends on your specific use case. Don't overengineer. And should the simple solution have side effects, you can always up the ante. – mario Dec 13 '10 at 04:56
1

I don't see any trouble, but for S&G's try placing the set_time_limit inside the while loop. This ensures they don't hit a hard limit and (as long as the client's taking the information) the time-limit gets extended.

Brad Christie
  • 100,477
  • 16
  • 156
  • 200
  • You could just use `set_time_limit(0)` to impose no time limit. – alex Dec 13 '10 at 01:49
  • I tend to find that a band idea, in case an operation hands (for whatever reason). I try to always give PHP an opportunity to cut the tie, otherwise (for whatever reason) something goes wrong and you have a thread just sitting there dormant. – Brad Christie Dec 13 '10 at 01:51
  • how many seconds do you think i should extend by? maybe 5-10? – jimiyash Dec 13 '10 at 02:01
  • How long does it take you to download 4096_bytes_? ;-) you _could_ just use 30 seconds to be safe. Allows for hiccups in-between and still not over-doing it. – Brad Christie Dec 13 '10 at 02:03