15

I need to programmatically initiate file downloads using PHP along with resume-support

These files are heavy. So IO buffering like below or caching is not an option

$content=file_get_contents($file);
header("Content-type: application/octet-stream");
header('Content-Disposition: attachment; filename="' . basename($file) . '"');
header("Content-Length: ". filesize($file));
echo $content;

The only viable option I found so far is the Apache module X-sendfile. Unfortunately our hosting service won't install mod_xsendfile - so we are looking for other hosting providers, but that's another story.

We are using LAMP and the yii framework. What are possible alternatives?

S B
  • 8,134
  • 10
  • 54
  • 108
  • Well actually, `x-sendfile` has been made for that. If your provider does not offer this, it will most certainly not offer any of the (available?) alternatives which as well must integrate with the server. – hakre Aug 25 '11 at 09:37
  • 3
    possible duplicate of [Resumable downloads when using PHP to send the file?](http://stackoverflow.com/questions/157318/resumable-downloads-when-using-php-to-send-the-file) - note that you may want to `fread` a part of the file, `echo` to client, repeat; instead of `file_get_contents` which can be problematic for huge files. – Piskvor left the building Aug 25 '11 at 09:42
  • @hakre: I was thinking on the lines of a PHP-only solution, whereby the URL request is first trapped by PHP where I log the request, check if the file exists and then pass on the request to Apache – S B Aug 25 '11 at 09:46
  • @Saptarshi Biswas: Well if you're using mod_php you can fire a subrequest, but I think the PHP will be still running. – hakre Aug 25 '11 at 09:48
  • @piskvor: On looking at the other question closely, this is not a duplicate. The objective of the other question is to handle file tunneling fully in PHP while hiding the actual file location. But my objective is to only initiate the download in PHP and then let Apache handle file serving. Also, these files are huge, so tunneling through PHP is not efficient in my case – S B Aug 25 '11 at 10:37
  • @Saptarshi Biswas: If you can't install the requested module, you can't have `X-Sendfile` (or similarly pass the control back to Apache), sorry. – Piskvor left the building Aug 25 '11 at 11:22
  • @Saptarshi Biswas: As to "not efficient" - please clarify: how is it "inefficient"? If you only read and echo a smaller block of the file at a time (e.g. tens of KB), your PHP script won't use much memory, even though it will send the (requested part of the) huge file in the end. (I have implemented such solution, it worked fairly quickly and without consuming undue memory, even for multi-GB files). – Piskvor left the building Aug 25 '11 at 11:26
  • It's inefficient because when apache serves files directly it uses `sendfile`, which is zero-copy. that's far better than reading from PHP to userspace, then output it, then read it from apache and finally write it back to kernel space before sending to the user. – Karoly Horvath Aug 25 '11 at 12:06
  • 2
    have you thought about using a CDN that supports some kind of ACL? e.g., see [here](http://stackoverflow.com/questions/1770502/using-a-cdn-like-amazon-s3-to-control-access-to-media) – ldg Aug 25 '11 at 18:15
  • @yi_H: I'm pretty sure Apache and PHP are both in userspace, and kernel doesn't come into it save for disk reads and net writes; you are indubitably correct that fewer layers of indirection will be more efficient, but I'm not sure how much more efficient (plus, anything is more efficient than "well sendfile would work, but you can't do it" ;)). – Piskvor left the building Aug 25 '11 at 18:19
  • 1
    @Piskvor: well actually I wrote an HTTP server so I know it means *a lot*. try to saturate a 10G interface and you'll see the difference (copy 1 GByte of data / second, or 2 if it's a module like PHP. Do you see the problem? Your RAM is not going to be fast enough. also, you will kill your CPU cache by having multiple copies of the same data... anyway, the duplicate question solves the problem, so it's still better than *nothing*. – Karoly Horvath Aug 25 '11 at 20:22
  • @yi_H: I'm not trying to downplay your expertise - quite the contrary, it's from experts that I've learned the most here. Thank you for the information, it is indeed very interesting. – Piskvor left the building Aug 26 '11 at 00:32
  • @ldg: Interesting & clean approach! Unfortunately in our case, we cannot generate short-lived & dynamic URLs, because the existing auto-update libraries already installed by our user-base are aware of only a preset & unique URL. I think I'll have to find a CDN that allows using x-sendfile – S B Aug 26 '11 at 05:55
  • @Piskvor: You are right that both A & P in Lamp are in user-space. But I was thinking in general that in a scalable solution, an additional layer should be avoided if it can be. Our philosophies might differ though, given other constraints and available workarounds. – S B Aug 26 '11 at 06:14
  • Re the CDN URLs, you can have a fixed URL pointing to your application and let that do the rights management and redirect to the CDN, creating the URL on-the-fly if need be. – ldg Aug 26 '11 at 18:42

2 Answers2

1

Will your hosts allow you to install something like Perlbal (http://www.danga.com/perlbal/) as a proxy in front of apache?

Perlbal allows you to offload file-serving to it with a very similar approach to x-sendfile (using X-REPROXY-URL: /path/to/a/local/file.jpg), and it's pretty high-performance. (LiveJournal and Flickr both use(d) it. It would require you to run apache on a different port, though, and run perlbal on port 80, which your hosting provider might not like. Of course, you could do the same thing with something like nginx if you didn't fancy perlbal.

Chris May
  • 670
  • 3
  • 6
  • No, unfortunately perlbal is not allowed. At any rate, I have moved on to a dedicated server so I can use x-sendfile among other things – S B Dec 02 '11 at 16:40
1

You could emulate that by reading the request headers and output the content in 4kb steps with fopen, fseek, fread and so on. See also the possible request headers here. You should also implement an ETag to let the client identify that the file has not changed.

rekire
  • 47,260
  • 30
  • 167
  • 264