3

I am working on a web application. It will let users to download files from the server through HTTP protocol. The files can be up to 4 GB large.

These are my requirements and constraints:

  • HTTP File Download Progress in %
  • Register, when the HTTP file download finishes
  • Register, if the HTTP file download crashed
  • Register, if the user cancelled the download
  • Resume unfinished file download
  • To be able to download files up to 4GB
  • Should be implemented only JavaScript/HTML5 on the client side and PHP on the server.
  • May not be implemented in Java or Flash on the client side.

My Development Environment:

  • Apache
  • PHP
  • MySQL
  • Windows 7

My problem is, that despite I already have written PHP script that can download large files, I can not efficiently monitor aborted downloads (browser closed, cancelled download, aborted internet connection). PHP function connection_aborted() catches cca 50% of all aborted downloads.

My question therefore is, if there is any way at all to really efficiently and precisely monitor the download progress and aborted downloads? What about using NGINX or LIGHTTPD web servers? What about writing my own LUA or Perl module for Apache, where I would monitor PHP output buffer?

My Current Download Script:

    while(!feof($fileObject))
    {
        usleep(100000);

        //print(@fread($fileObject, $chunkSize));
        echo(@fread($fileObject, $chunkSize));

        // gradually output buffer to avoid memory problems by downloading large files
        ob_flush();
        flush();

        // check if the client was disconnected
        // important for cancelled or interrupted downloads
        if (Connection_Aborted())
        {

            // sent to the database that the connection has been aborted
            $result = mysqli_query($dbc, "UPDATE current_downloads SET connection_aborted=TRUE WHERE user_id=1;");

            // close the database connection
            mysqli_close($dbc);

            // close the open file
            @fclose($fileObject);

            exit(json_encode(array("result" => false, "error" => "Connection with the client was aborted.")));
        }

        $nLoopCounter++;
        $transferred += $chunkSize;
        $downloadPercentage = (($nLoopCounter * $chunkSize) / $fileSize) * 100;

        $result = mysqli_query($dbc, "UPDATE current_downloads SET progress_percent=$downloadPercentage, transferred=$transferred, connection_aborted=$strConnectionAborted, iteration=$nLoopCounter WHERE user_id=1;");
        if($result == false)
        {
            // close the database connection
            mysqli_close($dbc);

            // close the file
            fclose($handle);

            // prepare output message
            $outputArray = array("result" => 0, "message" => "Error Processing Database Query");

            // output the message
            echo json_encode($outputArray);
            exit;
        }
    }

Thank you.

Bunkai.Satori
  • 4,698
  • 13
  • 49
  • 77
  • I see that despite I have set bounty of 50 points on my question, I have received three down-votes. May I kindly ask, what is wrong with my question? It will help me to formulate my questions better in the future. – Bunkai.Satori Feb 16 '14 at 14:39
  • Are you sure that PHP isn't aborting your script when a client disconnects or at timeout? You must use one of the methods from http://www.php.net/manual/en/features.connection-handling.php to be sure. (ignore_user_abort or register_shutdown_function) – Martin Feb 17 '14 at 14:16
  • Hi Martin, I am completely positive about this. Firstly I have set `ignore_user_abort(true);` at the beginning of the script. Secondly, after I cancel the download or close the browser, I monitor my MySQL current_downloads table. It still keeps updating, which means, the loop still runs. However, I've just uninstalled WAMP and installed EasyPHP. So far, `connection_aborted()` detects always when I manually cancel the download. However, when I close the browser window remains undetected, which is as well bad. – Bunkai.Satori Feb 17 '14 at 14:41

4 Answers4

3

Taking your requirement constraints into acount, i'd say it is impossible (at least to cover 100% of the browsers) for various reasons (See a "hacky" solution bellow):

You can display the download process by frequently pulling a second page that returns the %-Value your download script may store in the database. However - as you already noticed - PHP does not offer reliable methods to determine whether a user has aborted or not.

To bypass this problem you could do the following:

Create a download.php file, that is able to return files in chunks. Write a javascript that iteratively is pulling all available chunks, until the download is finished (i.e. download.php?fileId=5&chunk=59). The Javascript can then combine all retrieved chunks and finally render the completed file.

However, whit Javascript you can not directly write to the Harddisk, means:You need to download all chunks to present the user a "finished file". If he stops in between, all the data is lost, which violates your constraint of beeing able to resume downloads.

Since resuming file downloads is a task that has to be implemented on the client side (You somehow need to pick up already downloaded data) you can not do anything about this on the server side. And with JavaScript lacking the functionality of writing (or reading) harddisk directly, it is impossible with only php/Javascript. (In fact there ARE Filesystem Functions in Javascript, but in general no browser allows them for remote sites.)


As a hacky solution, you can abuse the browser cache for resuming file downloads:

Note, that there are various cases, when this does not work:

  • Users may have disabled browser cache.
  • Browsers may render files "outdated" on their own.
  • Browsers may simply ignore your cache advice.

However, with this solution, the worst case will be that the caching / resuming does not work.

Implement a download.phpas mentioned above. The following example is using a fixed ChunkSize of "10", which you may want to addapt to your needs (or even a fixed chunk size -> Fix the calculations as required)

<?

header('Cache-Control: max-age=31556926');
$etag = 'a_unique_version_string';
header('ETag: "'.$etag.'"');

$chunkCount = 10;

$file = $_GET["file"]; //ignored in this example
$file = "someImage.jpg";
$chunk = $_GET["chunk"];

$fileSize = filesize($file);
$chunkSize = ceil($fileSize / $chunkCount); //round to whole numbers.

//get required chunk.
$handle = fopen($file, "r");
$start = ($chunk-1) * $chunkSize + ($chunk-1);
$toRead = min($chunkSize+1, $fileSize - $start); //read next chunk or until EOF.
$end = $start + $toRead;

//echo "reading $toRead from $start to $end";
//die();

if (fseek($handle, $start) == 0){
  $c = fread($handle, $toRead); 
  echo $c;
  @fclose($handle);
}else{
  //error seeking: handle it.
}

?>

Now, any client can download chunks, by calling an url (I setup a demo on my server) like this:

downloading http://dog-net.org/dltest/download.php?file=1&chunk=1
downloading http://dog-net.org/dltest/download.php?file=1&chunk=2
downloading http://dog-net.org/dltest/download.php?file=1&chunk=3
downloading http://dog-net.org/dltest/download.php?file=1&chunk=4
downloading http://dog-net.org/dltest/download.php?file=1&chunk=5

Independent chunks are worthless, so the mentioned JavaScript comes into the game. The following snippet can be generated when a download is invoked. It then will iterate over all required chunks and download them "One by One". If the user aborts, the browser will still have single chunks cached. Meaning: Whenever the user will start the download again, already downloaded chunks will finish within a split second, and not yet requested chunks will be downloaded regulary

<html>
  <head>   
    <script language="javascript">
      var urls = new Array();
      urls[0] = "http://dog-net.org/dltest/download.php?file=1&chunk=1";
      urls[1] = "http://dog-net.org/dltest/download.php?file=1&chunk=2";
      urls[2] = "http://dog-net.org/dltest/download.php?file=1&chunk=3";
      urls[3] = "http://dog-net.org/dltest/download.php?file=1&chunk=4";
      urls[4] = "http://dog-net.org/dltest/download.php?file=1&chunk=5";
      urls[5] = "http://dog-net.org/dltest/download.php?file=1&chunk=6";
      urls[6] = "http://dog-net.org/dltest/download.php?file=1&chunk=7";
      urls[7] = "http://dog-net.org/dltest/download.php?file=1&chunk=8";
      urls[8] = "http://dog-net.org/dltest/download.php?file=1&chunk=9";
      urls[9] = "http://dog-net.org/dltest/download.php?file=1&chunk=10";

      var fileContent = new Array();


      function downloadChunk(chunk){
        var url = urls[chunk-1];
        console.log("downloading " + url);
        var xhr = new XMLHttpRequest();
        xhr.open("GET", url, true);
        xhr.responseType = 'blob';
        xhr.onload = function (e) {
          if (xhr.readyState === 4) {
            if (xhr.status === 200) {
              document.getElementById("log").innerHTML += "downloading " + url + "<br />";
              fileContent.push(xhr.response); 
              document.getElementById("percentage").innerHTML = chunk / urls.length * 100;

              if (chunk < urls.length){  
                downloadChunk(chunk+1);
              }else{
                finishFile();
              }
            } else {
              console.error(xhr.statusText);
            }
          }
        };
        xhr.onerror = function (e) {
          console.error(xhr.statusText);
        };
        xhr.send(null);
      }

      function finishFile(){
         contentType = 'image/jpg'; //TODO: has to be set accordingly!
         console.log("Generating file");
         var a = document.createElement('a');
         var blob = new Blob(fileContent, {'type':contentType, 'endings':'native'});

         console.log("File generated. size: " + blob.size);

         //Firefox
         if (navigator.userAgent.toLowerCase().indexOf('firefox') > -1){
            var url = window.URL.createObjectURL(blob);
            window.location = url;   
         }

         //IE 11 or chrome?
         if (!(window.ActiveXObject) && "ActiveXObject"){   
           //Chrome:
           if (window.chrome){
             a.href = window.URL.createObjectURL(blob);
             a.download = "download";  
             a.click();
           }else{
            //ie 11
            window.navigator.msSaveOrOpenBlob(blob, 'download');
          } 
         }
      } 


      function setProgress(chunk){
        document.getElementById("percentage").innerHTML = chunk / urls.length * 100;
      }
    </script>
  </head>
  <body onload="downloadChunk(1);">

  <div id="percentage"></div>
  <div id="log"></div>

  </body>
</html>

Note, that the Handling of Blobs is Pain in the ... I Managed to get it working in IE 11, Chrome 32 and Firefox 27. No way for Safari so far. Also i did NOT check older Versions.


Demo: http://dog-net.org/dltest/ (its a png image, so open with paint/irfranview/whatevs - file extension not set.)

On First Call, all File Chunks will be downloaded independant. On Second Call you will notice, that they finish pretty quick, because ALL the (already completed) calls have been cached by the browser. (I set cache time to "Forever" - In practice you dont want to do this, but pick like 7 days or so!)


Things you would need to do on your own:

  • Generate the required JavaScript-Download-Code (Second Snippet)
  • Add finishFile-Implementations for older Browser Versions.
  • Check Large files. (Only testet it up to 30 MB)
  • Pass the correct Mime-Type to the Snippet where required.
  • Adapt to your UI-Styling.
  • Ensure, Files are having a proper extension set.

This is just a thought that might give you some ideas, how to implement this.

However, I strongly recommed to use a Client-Side Implementation based o Flash/Java/Silverlight, so you have a failsave implementation that is not depending on Browser Versiosn or any other limitation!

dognose
  • 20,360
  • 9
  • 61
  • 107
  • hi and thank you for your very detailed answer(+1). The solution I have so far seems to be a bit cleaner. I download the file in chunks. My PHP script updates MySQL database with download progress info. I then regularly send AJAX calls to the server to get download progress. So everything seems pretty functional. The problem is, that I do not get always notified about aborted download, so I can not do any cleanup on the server side. – Bunkai.Satori Feb 21 '14 at 14:11
  • @Bunkai.Satori You could track the Ajax Requests for progress. If you pull new Data every 30 seconds, and 60 seconds have passed without request, the user most likely aborted the download. – dognose Feb 21 '14 at 17:48
  • On a side node: You are *not* delivering the file in chunks. You are reading the file into the memory of your server in chunks. What you present to the user is a finished file, delivered in a single request. – dognose Feb 21 '14 at 17:56
  • hi, and yes what you write makes sense. My AJAX calls are setup exactly in 0.5-second-intervals. I am a bit afraid of this tactics, to monitor amount downloaded. It can happen, that the download process will be simply so slow, that especially in large files, no change will be recorded. Imagine a file only 500KB large. On good speed internet connection, the file will be donwloaded in couple of seconds, and during those 30-second AJAX calls, monitoring progress would not be very precise. – Bunkai.Satori Feb 21 '14 at 19:14
  • hi mate and thank you very much for your detailed answer. I agree, you have done really a great job and it was difficult for me to decide whom to award with my bounty. Your answer is based on the fact, that clean solution to my problem can not be found, and various hacking techniques need to be implemented, which probably will not work for majority of browsers that are currently in use. On the other hand, ClosetGeek in the message below offers verified solution which he has implemented in the past with working results. I have therefore decided to award his answer with my bounty. – Bunkai.Satori Feb 22 '14 at 21:50
2

My final solution for PHP's connection related problems was to create a web-server using Boost.Asio and a little known threadsafe SAPI released by Facebook. The download link is broken, but it can be found on github here.

The main problem that I experienced while tying to make it work using Apache and other webservers was an inconsistency between existing SAPI's (Fast-CGI, PHP-FPM, mod_apache, etc) and the connected related functions in PHP. They simply were not reliable under any situation that I tried, although many others claim to have gotten it to work with there specific configuration (OS version, Webserver version, SAPI version, PHP version, etc).

The main problem (as you've observed) is that PHP is significantly isolated from Apache and other webservers. By using an embedded PHP sapi you are able to have a greater level of cooperation between PHP and actual socket connections as well as other network related functions. This is the only way that I have been able to get PHP to work hand in hand with a webserver, which is very much what your needing.

However, on a second note, there are many serious pure PHP services surfacing now that PHP has mostly fixed it's garbage collection issues. A simple file server could easily be made using non-blocking sockets or PHP streams, and would likely be fast considering that it would be servicing static content using an async pattern.

I wouldn't mind posting some Boost.Asio tidbits or a simple PHP file service if you feel this is the direction that your solution needs to move. However, it is definitely possible. Many thousands of services have ran into this problem already.

JSON
  • 1,819
  • 20
  • 27
  • @ClsetGeek, hi and thanks for finding time to compose this answer(+1). I checked Boost.Asio and it looks interesting. So you believe that using WebSockets or PHP streams is the way to go. However, to me knowledge, there is a limitation on the number of simultaneous open WebSockets connections. The limitation can be slightly increased, but the number will still be quite low. We are talking about 50-100 simultaneous connections. Have you had to deal with it? – Bunkai.Satori Feb 22 '14 at 13:49
  • @Bunkai.Satori - I'm not aware of the connection limitation and couldn't find any mention of it. PHP stream buffers are limited in size so I can see where this could become an issue, but [php_socket](http://www.php.net/manual/en/book.sockets.php) shouldn't have the same limitations. I've know of a PHP socket chat server that can handle 1,000+ connections, and I know from personal experience that Boost.Asio can handle 10,000+. – JSON Feb 22 '14 at 21:35
  • 1
    Dognose wrote really complete answer, nice formatted, clean, with a lot of text. However, I am looking for clean solution, and his one is based on the assumption that what I need can not be done in a clean way, and various hacking solutions are needed. On the other hand, you present solution which is clean. You have tested it, and despite I do not have enough time to test everything before the bounty expires, I award the bounty to you. I have enough materials to start with. Thank you for finding time, and providing me with answer. Enjoy my 50 points :-) – Bunkai.Satori Feb 22 '14 at 21:46
1

You can implement the solution using HTML5 WebSockets.

There are client libraries (built using JavaScript) that abstract out the API in an easy to use way.

There are server side libraries (built using PHP) that implement a WebSocket server.

This way, you can have bi-directional communication and you can capture on the server side, all the possible events that you have mentioned.

Due to shortage of time, I am not providing code but hopefully this gives some direction.

Software Guy
  • 3,190
  • 4
  • 21
  • 21
  • hi and thanks for your great response(+1). I was already thinking about this. I was thinking about COMET implemented persisting connection between client and server. However, I came to a limitation, when APACHE server offers limited persistent WebSockets or persistent connections. I would then probably need to go with LIGHTTPD or NGINX. Would you possibly have more information on this topic, please? – Bunkai.Satori Feb 21 '14 at 22:50
  • 1
    hi Bunkai, you're welcome. A google search on 'file download websocket' returned some interesting results, out of which this might be useful: http://mustafaakin.wordpress.com/2011/10/16/introducing-websocket-file-transfer/ - apologies if that doesn't help enough – Software Guy Feb 21 '14 at 22:57
  • I wouldn't suggest using websocket for file transfers yet. Browser support for file handling is very limited, and none of the browsers have a consistent API yet for handling binary data over websockets. It will likely become a viable option in time, but that's still a while off. – JSON Feb 22 '14 at 21:40
0

In reality, there is no way with PHP (which is a server-side, not a client-side, language) to truly detect when a file download has completed. The best you can do is log the download in your database when it begins. If you absolutely, completely and totally need to know when the download has completed, you'll have to do something like embed a Java applet or use Flash. However, usually that is not the correct answer in terms of usability for your user (why require them to have Java or Flash installed just to download something from you?).

From here.

You can still try to learn a little more about ignore_user_abort and connection_aborted. I might fit somehow what you need. But you will not get really efficiently and precisely enough to monitor if the download was really concluded.

Community
  • 1
  • 1
Patrick Bard
  • 1,804
  • 18
  • 40
  • Hi Patrick. Thanks for your response. Well there must be an efficient way to monitor downloads even without Java. Just take a look at file sharing portals such as Rapidshare.com(just an example). They must be aware of the download progress on the user side and they do it all without requiring the user to install Java. – Bunkai.Satori Feb 17 '14 at 16:01
  • 2
    I don't think that it is impossible. But in you example, a "great" team of people achieved a way to do that. It's not something very simple that you just simple call some functions and magic happens.... might be hard way if you really want to achieve that. – Patrick Bard Feb 17 '14 at 16:04
  • well to my information, all those portals were established by single individuals, wheter rapidshare.com, megaupload.com, hotfile.com, uploading.com, etc... I think, it should not be that difficult. it is just a matter of finding the correct solution to given problem. – Bunkai.Satori Feb 17 '14 at 16:15
  • 1
    @Bunkai.Satori I am not an expert, I am still learning a lot, thus I can't say much about it. But if you have experience as you look like, there is nothing preventing you to try. I am just saying that it could not be as easy as it looks like. – Patrick Bard Feb 17 '14 at 17:08
  • 1
    I have to tell you, I am thankful to you for your advices. At this moment it looks that there is something wrong with the WAMP/EasyPHP. When I remove one and install the other, the behavior of `connection_aborted()` always changes a bit. But it is too early to say more. But do not take my words as your criticism. I simply can not give up this project and have to find stable solution :-) – Bunkai.Satori Feb 17 '14 at 17:12
  • 1
    I just shown you a main idea of people who already tried, and looks like their were not really lucky, or at least not determined enough. Try hard, and answer yourself if found out any solution :) – Patrick Bard Feb 17 '14 at 17:20
  • 2
    Yes, I have to admit.. I have read tons of posts where people had similar issues to mine. Well, I do not have any other choice but finding the solution. – Bunkai.Satori Feb 17 '14 at 17:24
  • @PatrickBard - it's generally a bad idea to repost things you don't understand. There are no standard 'just-add-water' solutions for this but it can be done in multiple ways – JSON Feb 19 '14 at 00:10
  • @ClosetGeek, hi ClosetGeek. Would you be so kind and share couple of ideas with me, please? It looks that you are pretty sure with what you say. I would really appreciate to get some help on this topic. Thanks in advance. – Bunkai.Satori Feb 19 '14 at 11:43
  • @Bunkai.Satori - I will be able to help later in the day. I had to deal with this issue a few years ago when making a paid download extension for a Xoops CMS user. – JSON Feb 19 '14 at 12:52
  • @ClosetGeek, ah that is great. If you could answer here, please, so I can grant you those 50 points of bounty. I will gladly inform you in detail how far I am, what I have done so far, what works and does not work for me. I would really be thankful, if you forget me not. – Bunkai.Satori Feb 19 '14 at 13:39
  • @ClosetGeek, Hi ClosetGeek. Do you think you could provide me with some information on this subject please? Tomorrow my bounty expires, and I would like to have it used for something meaningful. Not just to waste it. What is however more important is, that I would like to find reliable solution to my problem. – Bunkai.Satori Feb 20 '14 at 13:09
  • @Bunkai.Satori - Working on an answer now. Note that my answer will mainly address server side connection issues. Resuming failed downloads is not possible for the reasons listed by dognose. – JSON Feb 21 '14 at 07:45
  • @ClosetGeek, Hi and thank you in advance. Just please, be aware that the bounty expires in 10 Hours. It means, that when it expires, it will be automatically awarded to anybody who provided answer, whether it was correct or incorrect. – Bunkai.Satori Feb 21 '14 at 13:57