22

Ryan Dahl has said he invented NodeJS to solve the file upload progress bar problem (https://youtu.be/SAc0vQCC6UQ). Using technology available in 2009 when Node was introduced, so before Express and more advanced client-side javascript libraries that automagically tell you progress updates, how exactly did NodeJS solve this problem?

Trying to use just Core NodeJS now, I understand with the request stream I can look at the header, get the total file size, and then get the size of each chunk of data as it comes through, to tell me the percent complete. But then I don't understand how to stream those progress updates back to the browser, since the browser doesn't seem to update until request.end().

Once again I want to wrap my ahead around how NodeJS originally solved this progress update problem. WebSockets weren't around yet, so you couldn't just open a WebSocket connection to the client and stream the progress updates back to the browser. Was there another client-side javascript technology that was used?

Here is my attempt so far. Progress updates are streamed to the server-side console, but the browser only updates once the response stream receives response.end().

var http = require('http');
var fs = require('fs');

var server = http.createServer(function(request, response){
    response.writeHead(200);
    if(request.method === 'GET'){
        fs.createReadStream('filechooser.html').pipe(response);     
    }
    else if(request.method === 'POST'){
        var outputFile = fs.createWriteStream('output');
        var total = request.headers['content-length'];
        var progress = 0;

        request.on('data', function(chunk){
            progress += chunk.length;
            var perc = parseInt((progress/total)*100);
            console.log('percent complete: '+perc+'%\n');
            response.write('percent complete: '+perc+'%\n');
        });

        request.pipe(outputFile);

        request.on('end', function(){
            response.end('\nArchived File\n\n');
        });
    }

});

server.listen(8080, function(){
    console.log('Server is listening on 8080');
});

filechooser.html:

<!DOCTYPE html>
<html>
<body>
<form id="uploadForm" enctype="multipart/form-data" action="/" method="post">
    <input type="file" id="upload" name="upload" />
    <input type="submit" value="Submit">
</form>
</body>
</html>

Here is an Updated attempt. The browser now displays progress updates, but I'm pretty sure this isn't the actual solution Ryan Dahl originally came up with for a production scenario. Did he use long polling? What would that solution look like?

var http = require('http');
var fs = require('fs');

var server = http.createServer(function(request, response){
    response.setHeader('Content-Type', 'text/html; charset=UTF-8');
    response.writeHead(200);

    if(request.method === 'GET'){
        fs.createReadStream('filechooser.html').pipe(response);     
    }
    else if(request.method === 'POST'){
        var outputFile = fs.createWriteStream('UPLOADED_FILE');
        var total = request.headers['content-length'];
        var progress = 0;

        response.write('STARTING UPLOAD');
        console.log('\nSTARTING UPLOAD\n');

        request.on('data', function(chunk){
            fakeNetworkLatency(function() {
                outputFile.write(chunk);
                progress += chunk.length;
                var perc = parseInt((progress/total)*100);
                console.log('percent complete: '+perc+'%\n');
                response.write('<p>percent complete: '+perc+'%');
            });
        });

        request.on('end', function(){
            fakeNetworkLatency(function() {
                outputFile.end();
                response.end('<p>FILE UPLOADED!');
                console.log('FILE UPLOADED\n');
            });
        });
    }

});

server.listen(8080, function(){
    console.log('Server is listening on 8080');
});

var delay = 100; //delay of 100 ms per chunk
var count =0;
var fakeNetworkLatency = function(callback){
    setTimeout(function() {
        callback();
    }, delay*count++);
};
  • One thing to note is that, despite the call to response.write, the browser doesn't bother displaying anything until it has enough data, as this question points out: [link](http://stackoverflow.com/questions/14540335/node-js-i-cant-reproduce-progressive-response-from-server/14540584#14540584) – JohnnyFun Jul 21 '15 at 02:20
  • @JohnnyFun that makes sense. I'm still curious how Ryan solved sending progressed updates to the browser, given the solution of forcing the browser to clear its buffer referenced in the link isn't a production solution. – HelpMeStackOverflowMyOnlyHope Jul 21 '15 at 02:30
  • This question has made an impact on my interests. I am now increasing interested to get to the bottom of how networks work . For that reason I've been learning C – Gilbert Mar 02 '22 at 17:16

2 Answers2

18

Firstly, your code is indeed working; node sends chunked responses, but the browser is simply waiting for more before bothering to show it.

More info in Node Documentation:

The first time response.write() is called, it will send the buffered header information and the first body to the client. The second time response.write() is called, Node assumes you're going to be streaming data, and sends that separately. That is, the response is buffered up to the first chunk of body.

If you set content-type to html like response.setHeader('Content-Type', 'text/html; charset=UTF-8');, it makes chrome render the content, but that only did the trick when I used a series of set timeout calls with response.write calls inside; it still didn't update the dom when I tried with your code, so I dug deeper...

The trouble is that it's really up to the browser to render content when it sees fit, so I set up code to send ajax requests to check status instead:

Firstly, I updated the server to simply store its status in a global variable and open a "checkstatus" endpoint to read it:

var http = require('http');
var fs = require('fs');
var status = 0;

var server = http.createServer(function (request, response) {
    response.writeHead(200);
    if (request.method === 'GET') {
        if (request.url === '/checkstatus') {
            response.end(status.toString());
            return;
        }
        fs.createReadStream('filechooser.html').pipe(response);
    }
    else if (request.method === 'POST') {
        status = 0;
        var outputFile = fs.createWriteStream('output');
        var total = request.headers['content-length'];
        var progress = 0;

        request.on('data', function (chunk) {
            progress += chunk.length;
            var perc = parseInt((progress / total) * 100);
            console.log('percent complete: ' + perc + '%\n');
            status = perc;
        });

        request.pipe(outputFile);

        request.on('end', function () {
            response.end('\nArchived File\n\n');
        });
    }

});

server.listen(8080, function () {
    console.log('Server is listening on 8080');
});

Then, I updated the filechooser.html to check the status with ajax requests:

<!DOCTYPE html>
<html>
<body>
<form id="uploadForm" enctype="multipart/form-data" action="/" method="post">
    <input type="file" id="upload" name="upload"/>
    <input type="submit" value="Submit">
</form>

Percent Complete: <span id="status">0</span>%

</body>
<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.11.3/jquery.min.js"></script>
<script>
    var $status = $('#status');
    /**
     * When the form is submitted, begin checking status periodically.
     * Note that this is NOT long-polling--that's when the server waits to respond until something changed. 
     * In a prod env, I recommend using a websockets library with a long-polling fall-back for older broswers--socket.io is a gentleman's choice)
     */
    $('form').on('submit', function() {
        var longPoll = setInterval(function () {
            $.get('/checkstatus').then(function (status) {
                $status.text(status);

                //when it's done, stop annoying the server
                if (parseInt(status) === 100) {
                    clearInterval(longPoll);
                }
            });
        }, 500);
    });
</script>
</html>

Note that despite me not ending the response, the server is still able to handle incoming status requests.

So to answer your question, Dahl was facinated by a flickr app he saw that uploaded a file and long-polled to check it's status. The reason he was facinated was that the server was able to handle those ajax requests while it continued to work on the upload. It was multi-tasking. See him talk about it exactly 14 minutes into this video--even says, "So here's how it works...". A few minutes later, he mentions an iframe technique and also differentiates long-polling from simple ajax requests. He states that he wanted to write a server that was optimized for these types of behavior.

Anyway, this was un-common in those days. Most web server software would only handle one request at a time. And if they went to a database, called out to a webservice, interacted with the filesystem, or anything like that, the process would just sit and wait for it to finish instead of handling other requests while it waited.

If you wanted to handle multiple requests concurrently, you'd have to fire up another thread or add more servers with a load balancer.

Nodejs, on the other hand, makes very efficient use of the main process by doing non-blocking IO. Node wasn't the first to do this, but what sets it apart in the non-blocking IO realm is that all its default methods are asynchronous and you have to call a "sync" method to do the wrong thing. It kind of forces users to do the right thing.

Also, it should be noted, the reason javascript was chosen was because it is already a language that is running in an event-loop; it was made to handle asynchronous code. You can have anonymous functions and closures, which makes async actions much easier to maintain.

I also want to mention that using a promise library also makes writing async code much cleaner. For instance, check out bluebirdjs--it has a nice "promisify" method that will convert functions on an object's prototype that have the callback signature (function(error, params){}) to instead return a promise.

JohnnyFun
  • 3,975
  • 2
  • 20
  • 20
  • Are there any other Content-Types other than text/html where the browser will respond that way of knowing you want the results slow/incrementally? – HelpMeStackOverflowMyOnlyHope Jul 21 '15 at 03:08
  • Not sure for your question about content-types (it's up to the browser though), but I edited my answer to indicate that this is probably not something that should be done in a prod environment. Instead use websockets/long-polling. (see my edit for more) – JohnnyFun Jul 21 '15 at 04:01
  • To fully answer my question, I'd like to know how to edit my code example to use long polling if that was the technology he used at the time to create the browser progress bar. Your explanation makes sense though and is very helpful in terms of thinking through the problem. – HelpMeStackOverflowMyOnlyHope Jul 21 '15 at 08:40
  • eh, I should really do more research honestly. I'm pretty new to stackoverflow--should I delete my answer so it's clearer for others that this hasn't quite been answered yet? – JohnnyFun Jul 21 '15 at 13:24
  • No your answer is helpful. Once there is an answer submitted that fully covers the question I can mark it as the official answer, and it will get a green checkmark. I typically scan through the various answer that have been submitted, and each can provide a different perspective that help you better understand the solution... so I wouldn't delete what you wrote. – HelpMeStackOverflowMyOnlyHope Jul 21 '15 at 23:33
  • Ok, I updated my answer to include working code. Also note that I was wrong about something: node does **not** block other requests if you wait to call response.end! – JohnnyFun Jul 24 '15 at 06:41
  • Just to note, to do an actual Long Poll it would be really easy based on the code you provided above. Just switch it so that the jQuery .get() within html is a recursive function that keeps checking until status reaches 100. – HelpMeStackOverflowMyOnlyHope Jul 28 '15 at 08:48
  • The "setInterval" essentially makes it recursive--it stops sending "checkstatus" requests once it receives 100 percent complete. Btw, to see it actually change I had to upload a pretty big file--I was using a 200mb file and could see it incrementing. If you use a small file, it just happens too fast. A long poll would be a matter of changing the "checkstatus" endpoint to not bother returning a response until the status is different from the one it returned last time, I think. – JohnnyFun Jul 28 '15 at 15:54
  • Thanks for the good question, btw--I learned a lot researching. – JohnnyFun Jul 28 '15 at 15:55
  • Didn't understood what the `status` is doing but the rest helped me a lot, thanks! – Obzzen Sep 25 '15 at 19:40
  • Good deal, dude. `status` is simply used to tell the front end code how far along the server is in the process. – JohnnyFun Oct 06 '15 at 20:23
1

Node was more adept at solving this upload problem because of its single-threaded event loop. The code in the http event handlers can easily access the memory used by other event handlers. In a traditional web server environment the master daemon spins up worker threads to handle the requests. I would imagine that, in the traditional threaded model, it was difficult to check the file upload status because the client would need to make a new call to server asking "what is file progress?" which would then be handled by a completely separate thread. That new thread now needs to communicate with the currently running upload thread.