12

Note that irrelevant information to my question will be 'quoted'

like so (feel free to skip these).

Problem

I am using node to make in-order HTTP requests on behalf of multiple clients. This way, what originally took the client(s) several different page loads to get the desired result, now only takes a single request via my server. I am currently using the ‘async’ module for flow control and ‘request’ module for making the HTTP requests. There are approximately 5 callbacks which, using console.time, takes about ~2 seconds from start to finish (sketch code included below).

Now I am rather inexperienced with node, but I am aware of the single-threaded nature of node. While I have read many times that node isn’t built for CPU-bound tasks, I didn’t really understand what that meant until now. If I have a correct understanding of what’s going on, this means that what I currently have (in development) is in no way going to scale to even more than 10 clients.

Question

Since I am not an expert at node, I ask this question (in the title) to get a confirmation that making several sequential HTTP requests is indeed blocking.

Epilogue

If that is the case, I expect I will ask a different SO question (after doing the appropriate research) discussing various possible solutions, should I choose to continue approaching this problem in node (which itself may not be suitable for what I'm trying to do).

Other closing thoughts

I am truly sorry if this question was not detailed enough, too noobish, or had particularly flowery language (I try to be concise).

Thanks and all the upvotes to anyone who can help me with my problem!

The code I mentioned earlier:

var async = require('async');
var request = require('request');

...

async.waterfall([
    function(cb) {
        console.time('1');

        request(someUrl1, function(err, res, body) {
            // load and parse the given web page.

            // make a callback with data parsed from the web page
        });
    },
    function(someParameters, cb) {
        console.timeEnd('1');
        console.time('2');

        request({url: someUrl2, method: 'POST', form: {/* data */}}, function(err, res, body) {
            // more computation

            // make a callback with a session cookie given by the visited url
        });
    },
    function(jar, cb) {
        console.timeEnd('2');
        console.time('3');

        request({url: someUrl3, method: 'GET', jar: jar /* cookie from the previous callback */}, function(err, res, body) {
            // do more parsing + computation

            // make another callback with the results
        });
    },
    function(moreParameters, cb) {
        console.timeEnd('3');
        console.time('4');

        request({url: someUrl4, method: 'POST', jar: jar, form : {/*data*/}}, function(err, res, body) {
            // make final callback after some more computation.
            //This part takes about ~1s to complete
        });
    }
], function (err, result) {
    console.timeEnd('4'); //
    res.status(200).send();
});
youngrrrr
  • 3,044
  • 3
  • 25
  • 42
  • 3
    No, it is not blocking - hence why the callback is required to handle the asynchronous result. Naturally, if there is a required serialized relationship then the previous requests must complete in order to use the results .. – user2864740 Oct 06 '15 at 02:17
  • @user2864740 Is it still non-blocking if each of my function calls take ~1s? This happens only for the last function in the example code provided, but I think that means that for the 1s that node is processing the user's request, no other requests can be made? Surely this doesn't scale? Perhaps that would've been a better question to ask....... – youngrrrr Oct 06 '15 at 02:20
  • 2
    Blocking would mean that no other code - in the same execution context - could be run before the request completed. In this example the `request` calls will *return immediately* and subsequent operations can be done; hence non-blocking. Waiting on a request (as may be required) does not make the `request` call blocking. – user2864740 Oct 06 '15 at 02:21
  • 1
    The "waterfall" approach effectively causes serialization of the requests, but this is only an ordering imposed over the underlying async model and non-blocking IO done by Node.js. Trivial counter-examples include running multiple waterfalls in parallel, or even handling multiple concurrent Node.js requests. – user2864740 Oct 06 '15 at 02:25
  • @user2864740 I still have concerns about using node for this particular application then. I'm worried about how it will scale considering that the work it is doing is not simple I/O but computationally intensive. I feel as though I should switch to a platform that enables multi-threading. Does what I'm asking/the concern I'm expressing make sense? I could just be spitting total BS and have no idea what I'm talking about. – youngrrrr Oct 06 '15 at 02:27
  • 2
    Node.js will use "0 CPU" when performing the IO web request itself (the process is so ridiculously IO bound). If there *actually* is 'significant processing' (ie. in a callback), that is a separate concern of how much time it takes to fetch the external resource. Node.js computation trivial is "limited to one core/thread", which is often fast enough, unless using an additional concurrency strategy such as multiprocessing. – user2864740 Oct 06 '15 at 02:28
  • 2
    If it is a problem that the computations in the request callback are taking 1 second (which is indeed 'a long time to block node'), then work on making that the focus of the question/exploration. – user2864740 Oct 06 '15 at 02:32
  • @user2864740 ok, right. that makes sense. Thanks for your help and attention! – youngrrrr Oct 06 '15 at 02:34

3 Answers3

6

Your code is non-blocking because it uses non-blocking I/O with the request() function. This means that node.js is free to service other requests while your series of http requests is being fetched.

What async.waterfall() does it to order your requests to be sequential and pass the results of one on to the next. The requests themselves are non-blocking and async.waterfall() does not change or influence that. The series you have just means that you have multiple non-blocking requests in a row.

What you have is analogous to a series of nested setTimeout() calls. For example, this sequence of code takes 5 seconds to get to the inner callback (like your async.waterfall() takes n seconds to get to the last callback):

setTimeout(function() {
    setTimeout(function() {
        setTimeout(function() {
            setTimeout(function() {
                setTimeout(function() {
                    // it takes 5 seconds to get here
                }, 1000);
            }, 1000);
        }, 1000);
    }, 1000);
}, 1000);

But, this uses basically zero CPU because it's just 5 consecutive asynchronous operations. The actual node.js process is involved for probably no more than 1ms to schedule the next setTimeout() and then the node.js process literally could be doing lots of other things until the system posts an event to fire the next timer.

You can read more about how the node.js event queue works in these references:

Run Arbitrary Code While Waiting For Callback in Node?

blocking code in non-blocking http server

Hidden threads in Javascript/Node that never execute user code: is it possible, and if so could it lead to an arcane possibility for a race condition?

How does JavaScript handle AJAX responses in the background? (written about the browser, but concept is the same)

If I have a correct understanding of what’s going on, this means that what I currently have (in development) is in no way going to scale to even more than 10 clients.

This is not a correct understanding. A node.js process can easily have thousands of non-blocking requests in flight at the same time. Your sequentially measured time is only a start to finish time - it has nothing to do with CPU resources or other OS resources consumed (see comments below on non-blocking resource consumption).

I still have concerns about using node for this particular application then. I'm worried about how it will scale considering that the work it is doing is not simple I/O but computationally intensive. I feel as though I should switch to a platform that enables multi-threading. Does what I'm asking/the concern I'm expressing make sense? I could just be spitting total BS and have no idea what I'm talking about.

Non-blocking I/O consumes almost no CPU (only a little when the request is originally sent and then a little when the result arrives back), but while the compmuter is waiting for the remove result, no CPU is consumed at all and no OS thread is consumed. This is one of the reasons that node.js scales well for non-blocking I/O as no resources are used when the computer is waiting for a response from a remove site.

If your processing of the request is computationally intensive (e.g. takes a measurable amount of pure blocking CPU time to process), then yes you would want to explore getting multiple processes involved in running the computations. There are multiple ways to do this. You can use clustering (so you simply have multiple identical node.js processes each working on requests from different clients) with the nodejs clustering module. Or, you can create a work queue of computationally intensive work to do and have a set of child processes that do the computationally intensive work. Or, there are several other options too. This not the type of problem that one needs to switch away from node.js to solve - it can be solved using node.js just fine.

Community
  • 1
  • 1
jfriend00
  • 683,504
  • 96
  • 985
  • 979
  • Hi! Thanks so much for taking the time to reply. I read everything you wrote (minus some of the other SO posts you linked to - but I'll get to those, I promise!). I accepted the other answer because he was able to more concretely link the concepts he introduced to my particular code. That's not to say, however, that your answer wasn't also spectacular! Admittedly, you also did answer the exact question I asked (and more) but since I can only accept one answer, I will go with the one that helped all the pieces click together. Still, thanks for your very thoughtful and well-written answer! – youngrrrr Oct 08 '15 at 06:47
  • And additionally, thanks for addressing my commented concerns and +1 for mentioning clustering as a possible solution while still using node! – youngrrrr Oct 08 '15 at 06:48
6

Normally, I/O in node.js are non-blocking. You can test this out by making several requests simultaneously to your server. For example, if each request takes 1 second to process, a blocking server would take 2 seconds to process 2 simultaneous requests but a non-blocking server would take just a bit more than 1 second to process both requests.

However, you can deliberately make requests blocking by using the sync-request module instead of request. Obviously, that's not recommended for servers.

Here's a bit of code to demonstrate the difference between blocking and non-blocking I/O:

var req = require('request');
var sync = require('sync-request');

// Load example.com N times (yes, it's a real website):
var N = 10;

console.log('BLOCKING test ==========');
var start = new Date().valueOf();
for (var i=0;i<N;i++) {
    var res = sync('GET','http://www.example.com')
    console.log('Downloaded ' + res.getBody().length + ' bytes');
}
var end = new Date().valueOf();
console.log('Total time: ' + (end-start) + 'ms');

console.log('NON-BLOCKING test ======');
var loaded = 0;
var start = new Date().valueOf();
for (var i=0;i<N;i++) {
    req('http://www.example.com',function( err, response, body ) {
        loaded++;
        console.log('Downloaded ' + body.length + ' bytes');
        if (loaded == N) {
            var end = new Date().valueOf();
            console.log('Total time: ' + (end-start) + 'ms');
        }
    })
}

Running the code above you'll see the non-blocking test takes roughly the same amount of time to process all requests as it does for a single request (for example, if you set N = 10, the non-blocking code executes 10 times faster than the blocking code). This clearly illustrates that the requests are non-blocking.


Additional answer:

You also mentioned that you're worried about your process being CPU intensive. But in your code, you're not benchmarking CPU utility. You're mixing both network request time (I/O, which we know is non-blocking) and CPU process time. To measure how much time the request is in blocking mode, change your code to this:

async.waterfall([
    function(cb) {
        request(someUrl1, function(err, res, body) {
            console.time('1');
            // load and parse the given web page.
            console.timeEnd('1');
            // make a callback with data parsed from the web page
        });
    },
    function(someParameters, cb) {
        request({url: someUrl2, method: 'POST', form: {/* data */}}, function(err, res, body) {
            console.time('2');
            // more computation
            console.timeEnd('2');

            // make a callback with a session cookie given by the visited url
        });
    },
    function(jar, cb) {
        request({url: someUrl3, method: 'GET', jar: jar /* cookie from the previous callback */}, function(err, res, body) {
            console.time('3');
            // do more parsing + computation
            console.timeEnd('3');
            // make another callback with the results
        });
    },
    function(moreParameters, cb) {
        request({url: someUrl4, method: 'POST', jar: jar, form : {/*data*/}}, function(err, res, body) {
            console.time('4');
            // some more computation.
            console.timeEnd('4');

            // make final callback
        });
    }
], function (err, result) {
    res.status(200).send();
});

Your code only blocks in the "more computation" parts. So you can completely ignore any time spent waiting for the other parts to execute. In fact, that's exactly how node can serve multiple requests concurrently. While waiting for the other parts to call the respective callbacks (you mention that it may take up to 1 second) node can execute other javascript code and handle other requests.

slebetman
  • 109,858
  • 19
  • 140
  • 171
  • Thanks for your response and the example you included. While both answers I received were incredibly descriptive and helpful, I will accept yours since you were able to relate back to my console.log statements. Your additional answer made it clearer as to which operations, exactly, were/weren't blocking. I also tested it out and each of my 'computations' turned out to take less than ~100ms! Thanks for your contribution! – youngrrrr Oct 08 '15 at 06:45
2

You can use queue to process concurrent http calls in nodeJs https://www.npmjs.com/package/concurrent-queue

    var cq = require('concurrent-queue');
    test_queue = cq();

    // request action method
    testQueue: function(req, res) {
        // queuing each request to process sequentially
        test_queue(req.user, function (err, user) {
            console.log(user.id+' done');
            res.json(200, user)
        });
    },


    // Queue will be processed one by one.
    test_queue.limit({ concurrency: 1 }).process(function (user, cb) {
        console.log(user.id + ' started')

        // async calls will go there
        setTimeout(function () {
            // on callback of async, call cb and return response.
            cb(null, user)
        }, 1000);

    });

Please remember that it needs to implement for sensitive business calls where the resource needs to be accessed or update at a time by one user only.

This will block your I/O and make your users to wait and response time will be slow.

Optimization:

You can make it faster and optimize it by creating resource dependent queue. So that the there is a separate queue for each shared resource and synchronous calls for same resource can only be execute for same resource and for different resources the calls will be executed asynchronously

Let suppose that you want to implement that on the base of current user. So that for the same user http calls can only execute synchronously and for different users the https calls will be asynchronous

testQueue: function(req, res) {

    // if queue not exist for current user.
    if(! (test_queue.hasOwnProperty(req.user.id)) ){
        // initialize queue for current user
        test_queue[req.user.id] = cq();
        // initialize queue processing for current user
        // Queue will be processed one by one.
        test_queue[req.user.id].limit({ concurrency: 1 }).process(function (task, cb) {
            console.log(task.id + ' started')
            // async functionality will go there
            setTimeout(function () {
                cb(null, task)
            }, 1000)
        });
    }

    // queuing each request in user specific queue to process sequentially
    test_queue[req.user.id](req.user, function (err, user) {
        if(err){
            return;
        }
        res.json(200, user)
        console.log(user.id+' done');
    });
},

This will be fast and block I/O for only that resource for which you want.

Ibtesam Latif
  • 1,175
  • 1
  • 10
  • 13