6

I need to deliver big files like file.zip (~2 GB) to customers, with a unique URL for each customer. Then I will redirect (with .htaccess) a customer download link example.com/download/f6zDaq/file.zip to something like

example.com/download.php?id=f6zDaq&file=file.zip

But as the files are big, I don't want the fact that PHP processes the downloading (instead of just letting Apache handle it) to be a CPU / RAM performance issue for my server. After all, asking PHP to do it involves a new layer, so it might cause such an issue, if not done properly.

Question: among the following solutions, which one(s) are the best practice? (in particular, in terms of CPU/RAM)?

  • 1: PHP solution with application/download

    header('Content-Type: application/download');
    header('Content-Disposition: attachment; filename=file.zip');
    readfile("/path/to/file.zip");
    

    CPU usage measured while downloading: 13.6%.

  • 1bis: PHP solution with application/octet-stream (coming from Example #1 of this page)

    header('Content-Description: File Transfer');
    header('Content-Type: application/octet-stream');
    header('Content-Disposition: attachment; filename=file.zip');
    header('Expires: 0');
    header('Cache-Control: must-revalidate');
    header('Pragma: public');
    header('Content-Length: ' . filesize('file.zip'));
    readfile("/path/to/file.zip");
    
  • 1ter: PHP solution with application/octet-stream (coming from here):

    header('Content-Description: File Transfer');
    header('Content-Type: application/octet-stream');
    header('Content-Disposition: attachment; filename=file.zip'); 
    header('Content-Transfer-Encoding: binary'); // additional line
    header('Connection: Keep-Alive');
    header('Expires: 0');
    header('Cache-Control: must-revalidate, post-check=0, pre-check=0'); // additional line
    header('Pragma: public');
    header('Content-Length: ' . filesize('file.zip'));
    readfile("/path/to/file.zip");
    
  • 1quater: Another PHP variant with application/force-download (edited; coming from here):

    header("Content-Disposition: attachment; filename=file.zip");
    header("Content-Type: application/force-download");
    header("Content-Length: " . filesize($file));
    header("Connection: close");
    
  • 2: Apache solution, no PHP involved: let Apache serve the file, and use .htaccess to provide different URL for the same file (many ways to do it can be written). In terms of performance, it's similar to let the customer download example.com/file.zip, served by Apache server.

  • 3: Another PHP solution. This would probably work:

    $myfile = file_get_contents("file.zip");
    echo $myfile;
    

    but wouldn't this ask PHP to load the whole content in memory? (which would be bad in terms of performance!)

  • 4: Just do a header("Location: /abcd/file.zip"); redirection as explained in File with a short URL downloaded with original filename.

    Problem with this solution: this discloses the actual location of the file

     example.com/abcd/file.zip
    

    to the end user (who can then use or share this URL without authentification) which is not wanted...

    But on the other hand, it is much lighter for the CPU since PHP just redirects the request and doesn't deliver the file itself.

    CPU usage measured while downloading: 10.6%.


Note: the readfile doc says:

readfile() will not present any memory issues, even when sending large files, on its own. If you encounter an out of memory error ensure that output buffering is off with ob_get_level().

but I wanted to be 100% sure that it won't be slower / more CPU/RAM hungry than pure Apache solution.

Basj
  • 41,386
  • 99
  • 383
  • 673
  • 1
    Why don't you just benchmark both solutions? – akond Aug 31 '17 at 19:51
  • If you want to be sure, test it. – Patrick Q Aug 31 '17 at 19:51
  • I thought this is probably well-known @akond, and would be a useful answer for future reference. And also, I'm not linux-benchmarking-tools-connoisseur enough to do a precise meaningful test. – Basj Aug 31 '17 at 20:00
  • Can you please specify the reasons you're considering PHP to process the file download in your question? Effectively the `header` calls are superficial to the server, as they are instructions sent to the client, used to describe how the client should handle the response. Assuming `readfile` is the method used, the only `header` to cause "slowness" would be `content-length` due to the use of `filesize` requiring a system call, and is not a required `header`. The rest of the headers could theoretically be defined in `.htaccess` and have the same effects as in PHP. – Will B. Apr 28 '19 at 23:03
  • @fyrye: I'm using PHP to 1) log in my own database that the file with token ID `f6zDaq` (associated to a user) has well been downloaded 2) check if the token ID matches a user in the database before serving the file... If there are other ways than PHP to do this (directly with Apache / .htaccess) I'm interested too. – Basj Apr 29 '19 at 09:21
  • There are a few alternatives, my main concern would be *disk I/O* and *bandwidth* utilization. CPU and memory impact would be negligible as long as you're not using output buffering, `stream_get_contents` or `file_get_contents` to serve the file. The rest of the `header` calls won't affect the server side at all and can even be defined using `.htaccess`, for example to disable caching `Header set Cache-Control "max-age=0, no-cache, no-store, must-revalidate"`, etc. So the answer is that the *best practice* is to use `readfile`, everything else would be of opinion or circumstantial – Will B. Apr 29 '19 at 13:35
  • I guess my next questions would be surrounding user authentication and file access, Are you using sessions? How are you planning to handle the session during the file request? Are the files stored in a web accessible directory? If so how are they protected? How many users are expected to send a file request concurrently? Are you opposed to the use of cookies and redirects as an alternative to `readfile`? – Will B. Apr 29 '19 at 13:50
  • As long as documents at https://www.php.net/manual/en/function.readfile.php Don't give anything about memory issues so I think CPU load will be balanced without overloads – Creative87 May 03 '19 at 02:32
  • the best answer is in this duplicate: https://stackoverflow.com/a/3731639/3749523 – Sjon May 03 '19 at 08:31
  • @Sjon Thank you for this link. It seems to require an additional module (xsendfile) which I'd like to avoid (I'd like to keep as simple as possible). – Basj May 03 '19 at 08:52
  • @Basj you can't keep it simple if your webserver doesn't support certain features. If you'd use nginx you should checkout [X-Accel](https://www.nginx.com/resources/wiki/start/topics/examples/x-accel/) – Sjon May 03 '19 at 08:58

6 Answers6

6

You could use .htaccess to redirect the request to the file while keeping the permalink structure:

RewriteEngine On
RewriteBase /
RewriteRule ^download\/([^\/]+)\/file.zip download.php?id=$1 [L,NC]

Then in your download.php, you can check if the provided id is valid:

// Path to file
$file = 'file.zip';

// If the ID is valid
if ($condition) {
    header("Content-Disposition: attachment; filename=\"" . basename($file) . "\"");
    header("Content-Type: application/force-download");
    header("Content-Length: " . filesize($file));
    header("Connection: close");
} else {
    // Handle invalid ids
    header('Location: /');
}

When the user visits a valid url http://example.com/download/f6zDaq/file.zip, the download will start and the connection will be closed.

If the user visits an invalid url, they will be redirected to the home page.

Chin Leung
  • 14,621
  • 3
  • 34
  • 58
  • @Basj You can take a look at the second rule for restricting ids. – Chin Leung Aug 31 '17 at 20:10
  • @Basj To be fair, your question was really limited to "I wanted to be 100% sure that it won't be slower / more CPU/RAM hungry than pure Apache solution". You did't mention security concerns. – Patrick Q Aug 31 '17 at 20:19
  • @Basj What do you mean where? It's in the `download.php`. The headers are passing the file and it's the content-type that's forcing the download. – Chin Leung Aug 31 '17 at 21:02
  • The headers of `Content-Disposition: attachment;` are telling the browser that it should be downloaded and the browser is simply reading the data of the filename that you pass with it. – Chin Leung Aug 31 '17 at 21:11
  • @ChinLeung: I cleaned my old obsolete comments. Last question: in all other variants (see updated question), there is always: 1) headers, and then 2) `readfile` or `echo file_get_contents("file.zip")`. 2) is there to actually deliver the file. How does your solution deliver the file without `readfile` nor `echo file_get_contents(...)`? Thank you in advance. – Basj Apr 27 '19 at 13:41
  • @Basj In the headers we are specifying the name of the file and the browser simply retrieves it from the server. – Chin Leung Apr 29 '19 at 19:34
4

The biggest problems you're going to face with files of those sizes are the following:

  • people downloading it with a download manager
  • interrupted connections

Normally, keep-alive can be a bad idea, as it dedicates a connection to a download, which can bog down your network connections instead of allowing them to be freed up easily. However, if you're expecting all of your files to be large, this is your friend, because you don't want people re-starting those downloads. And those downloads will make reliable connections with keep-alive, and be easier for the client to resume which helps reduce people trying to re-download massive files.

As such, of your presented options, I recommend

1ter

However, as others on here, I still recommend you test your solutions, and preferably from a location separate than you're serving the files from.

Addendum: This said, serving with PHP isn't the best idea unless you have to get the header control features and .htaccess control in, because it's just adding more processing power. By far the better path would be simply to have the files in an accessible directory. .htaccess can rewrite access to files and folders, not just PHP scripts.

To create Apache-based protected download folders instead:

Options +FollowSymLinks
RewriteEngine On
RewriteRule ^/user/files/folder1.*$ http://example.com/userfiles/ [R=301,L]

Then, if you need to password-protect it, instead of using PHP, use Apache (which is already installed with most PHP installations). You do this by including a .htaccess file in the targeted folder (if you're dynamically making users, you might need to create a script to generate these for each new user) and making sure apache is prepped to handle passwords:

AuthType Basic
AuthName "Authentication Required"
AuthUserFile "/user/password/.htpasswd"
Require valid-user

(See here for more detail: Setting up Apache Passwords)

After this point, you make sure to have an .htpasswd file in the password directory with the format username:password/hashedpassword.

e.g.:

andreas:$apr1$dHjB0/..$mkTTbqwpK/0h/rz4ZeN8M0
john:$apr1$IHaD0/..$N9ne/Bqnh8.MyOtvKU56j1

Now, assuming you're not wanting them to pass in the password every single time, in the download link, include the access

<a href="user:pass@http://example.com/userfiles/myCoolZip.zip">Link (hopefully behind a password-protected interface.)</a> 

[Note: Do NOT use the direct password link method if passwords are not randomly assigned per file.]

OR if you're populating based off of the root apache password management AND your site is utilizing apache for it's login process, they might not need the user:pass part of the link at all, having already logged in with Apache.

NOTICE:

Now, this said, the files will be be accessible by people that the full link (with username/password) are shared with. So they'll be as secure (or as unsecure) as your server's https (or http if you allow) protocols, as well as your users sharing or not-sharing links.

Doing it this way, the files will be open to the users it's meant for with the full capabilities of the web accessible to them, meaning download helpers, browser-plugins that help, REST calls, and more, depending on your user's use cases. This can reduce security, which may or may not be a big deal depending on what you're hosting. If you're hosting private medical data (few users, high security, lower speed demands), I wouldn't do it this way. If you're hosting music albums, I'd totally do it this way (many users, lower security, high speed demands).

lilHar
  • 1,735
  • 3
  • 21
  • 35
  • Thank you for your answer @liljoshu. Just to be sure, do you recommend to use `header('Connection: Keep-Alive');`? If so, you probably mean *1ter* in your answer, instead of 1bis? – Basj Apr 29 '19 at 19:11
  • Yes, I'll fix that. I misread title with codeblock, apologies. – lilHar Apr 29 '19 at 19:12
  • 1
    Thank you @liljoshu. About your addendum, could you post a solution without PHP, but just .htaccess? (NB: I need to give access to files only if the user is using a specific token, and I need to log, in a custom database, if the file has been downloaded or not for each user; that's why I was using PHP) – Basj Apr 29 '19 at 19:38
  • @Basj I hope that helps. – lilHar Apr 29 '19 at 21:58
3

I would go with readfile. I used it for years, and never got memory issues, even running on a 128MB VPS.

Using PHP means you can easily handle authentication, authorization, logging, adding and removing users, expiring URL and so on. You can use .htaccess to do that, but you will have to write a rather large structure to handle this.

ThoriumBR
  • 930
  • 12
  • 25
  • Thank you for your answer. I edited the question, which variant of the headers would you use? (or maybe could you include your headers in the answer?) – Basj Apr 27 '19 at 13:44
1

You can use X-Accel-Redirect when your webserver is Nginx. For Apache it's mod_xsendfile with X-Sendfile header.

<?php
header('X-Accel-Redirect: /download/f6zDaq/file.zip');

It costs less, also have a better performance, because web server handles file.

Dai Jie
  • 23
  • 4
0

Memory & CPU wise you should probably go with readfile() or write some custom code using fopen() and fread() with custom buffer size.

Regarding the headers you send they do not impact the performance of the script, they will just instruct the client what to do with the server response (in your case, the file). You can Google each header and see what exactly it does.

You should probably have a look over this: Is there a good implementation of partial file downloading in PHP?. The things that might be interesting for you there: download range and download resuming support, ways to do this using web server plugins, PEAR packages or libraries that offer the functionality you need.

Luxian
  • 676
  • 7
  • 14
0

As mentioned in Fastest Way to Serve a File Using PHP, I finally did this:

apt-get install libapache2-mod-xsendfile
a2enmod xsendfile  # (should be already done by previous line)

Then I added this in apache2.conf:

<Directory />
  AllowOverride All
  Require all granted
  XSendFile on
  XSendFilePath /home/www/example.com/files/
</Directory>

I then did service apache2 restart and included this in .htaccess:

RewriteRule ^(.*)$ download.php?file=$1 [L,QSA]

and this in the download.php:

header("X-Sendfile: /home/www/example.com/files/hiddenfolder_w33vbr0upk80/" . $file);
header("Content-type: application/octet-stream");
header('Content-Disposition: attachment; filename="' . $file . '"');

NB: strangely, even I have AllowOverride All enabled in the apache2.conf VirtualHost, doing this:

XSendFile on
XSendFilePath /home/www/example.com/files/

just in the /home/www/example.com/.htaccess or /home/www/example.com/files/.htaccess file didn't work (it fails with xsendFilePath not allowed here).

Benchmark:

  • 10.6% CPU when downloading, exactly like if I do a direct download of the file with Apache (and no PHP at all), so it's all good!
Basj
  • 41,386
  • 99
  • 383
  • 673