9

In the following code I am trying to make multiple (around 10) HTTP requests and RSS parses in one go.

I am using the standard forEach construct on an array of URIs I need to access and parse the result of.

Code:

var articles;

feedsToFetch.forEach(function (feedUri)
{   
        feed(feedUri, function(err, feedArticles) 
        {
            if (err)
            {
                throw err;
            }
            else
            {
                articles = articles.concat(feedArticles);
            }
        });
 });

 // Code I want to run once all feedUris have been visited

I understand that when calling a function once I should be using a callback. However, the only way I can think of using a callback in this example would be to call a function which counts how many times it has been called and only continues when it has been called the same amount of times as feedsToFetch.length which seems hacky.

So my question is, what is the best way to handle this type of situation in node.js.

Preferably without any form of blocking! (I still want that blazing fast speed). Is it promises or something else?

Thanks, Danny

dannybrown
  • 1,083
  • 3
  • 13
  • 26
  • Yes, promises are the simplest way to go. If you'd rather cook up your own solution using standard JS, [this answer of mine](http://stackoverflow.com/questions/10075526/firing-callback-after-multiple-ajax-requests-are-complete/10075575#10075575) can be easily adapted. – Amadan Oct 09 '14 at 00:49
  • please show the code for the `feed()` function. – jfriend00 Oct 09 '14 at 00:51
  • @Amadan, maybe it's personal preference, but I don't think Promises are the "simplest" way to go. I've provided an answer to expand on this. – Mulan Oct 09 '14 at 02:35
  • You've figured it out: use a counter and count the number of requests made and completed. It's basically what async and promises do internally. About a year before the async library was written I wrote this answer to solve this exact problem: http://stackoverflow.com/questions/4631774/coordinating-parallel-execution-in-node-js/4631909#4631909 . This is a more advanced implementation that allows you to launch batches of async operations: http://stackoverflow.com/questions/13250746/process-chain-of-functions-without-ui-block/13252018#13252018 – slebetman Oct 09 '14 at 04:06
  • @naomik: I'd say personal preference - I don't see aarosil's answer much more complex than yours. – Amadan Oct 09 '14 at 06:10
  • @DanTonyBrown does it matter to you what order the articles end up in? The `serial` option by @naomik below is the only one that will preserve the order, unless I'm mistaken. – Stop Slandering Monica Cellio Oct 09 '14 at 15:40
  • @sequoiamcdowell with promises that would be a standard for loop... promises are an actual abstraction on sequencing things - it's a lot more sound and complete than the 'async' module, most modern languages have them under different names and they've been widely adopted. – Benjamin Gruenbaum Oct 09 '14 at 22:12
  • @BenjaminGruenbaum "promises is widely adopted [[by everything except IE](http://caniuse.com/#feat=promises)]". I like how you critique my answer for using a lib completely overlooking the fact I provided a roll-your-own solution as well. At the same time, you're basically saying "use something that's not supported in IE or get a polyfill (see: "lib")". – Mulan Oct 10 '14 at 02:16
  • I stand corrected-- .map preserves order. – Stop Slandering Monica Cellio Oct 10 '14 at 17:20

3 Answers3

10

No hacks necessary

I would recommend using the async module as it makes these kinds of things a lot easier.

async provides async.eachSeries as an async replacement for arr.forEach and allows you to pass a done callback function when it's complete. It will process each items in a series, just as forEach does. Also, it will conveniently bubble errors to your callback so that you don't have to have handler logic inside the loop. If you want/require parallel processing, you can use async.each.

There will be no blocking between the async.eachSeries call and the callback.

async.eachSeries(feedsToFetch, function(feedUri, done) {

  // call your async function
  feed(feedUri, function(err, feedArticles) {

    // if there's an error, "bubble" it to the callback
    if (err) return done(err);

    // your operation here;
    articles = articles.concat(feedArticles);

    // this task is done
    done();
  });
}, function(err) {

  // errors generated in the loop above will be accessible here
  if (err) throw err;

  // we're all done!
  console.log("all done!");
});

Alternatively, you could build an array of async operations and pass them to async.series. Series will process your results in a series (not parallel) and call the callback when each function is complete. The only reason to use this over async.eachSeries would be if you preferred the familiar arr.forEach syntax.

// create an array of async tasks
var tasks = [];

feedsToFetch.forEach(function (feedUri) {

  // add each task to the task array
  tasks.push(function() {

    // your operations
    feed(feedUri, function(err, feedArticles) {
      if (err) throw err;
      articles = articles.concat(feedArticles);
    });
  });
});

// call async.series with the task array and callback
async.series(tasks, function() {
 console.log("done !");
});

Or you can Roll Your Own™

Perhaps you're feeling extra ambitious or maybe you don't want to rely upon the async dependency. Maybe you're just bored like I was. Anyway, I purposely copied the API of async.eachSeries to make it easy to understand how this works.

Once we remove the comments here, we have just 9 lines of code that can be reused for any array we want to process asynchronously! It will not modify the original array, errors can be sent to "short circuit" the iteration, and a separate callback can be used. It will also work on empty arrays. Quite a bit of functionality for just 9 lines :)

// void asyncForEach(Array arr, Function iterator, Function callback)
//   * iterator(item, done) - done can be called with an err to shortcut to callback
//   * callback(done)       - done recieves error if an iterator sent one
function asyncForEach(arr, iterator, callback) {

  // create a cloned queue of arr
  var queue = arr.slice(0);

  // create a recursive iterator
  function next(err) {

    // if there's an error, bubble to callback
    if (err) return callback(err);

    // if the queue is empty, call the callback with no error
    if (queue.length === 0) return callback(null);

    // call the callback with our task
    // we pass `next` here so the task can let us know when to move on to the next task
    iterator(queue.shift(), next);
  }

  // start the loop;
  next();
}

Now let's create a sample async function to use with it. We'll fake the delay with a setTimeout of 500 ms here.

// void sampleAsync(String uri, Function done)
//   * done receives message string after 500 ms
function sampleAsync(uri, done) {

  // fake delay of 500 ms
  setTimeout(function() {

    // our operation
    // <= "foo"
    // => "async foo !"
    var message = ["async", uri, "!"].join(" ");

    // call done with our result
    done(message);
  }, 500);
}

Ok, let's see how they work !

tasks = ["cat", "hat", "wat"];

asyncForEach(tasks, function(uri, done) {
  sampleAsync(uri, function(message) {
    console.log(message);
    done();
  });
}, function() {
  console.log("done");
});

Output (500 ms delay before each output)

async cat !
async hat !
async wat !
done
Mulan
  • 129,518
  • 31
  • 228
  • 259
  • note is node/v8 vNext, it will have yield. Search for that. – bryanmac Oct 09 '14 at 04:02
  • the asyncEach here behaves more like `async.eachSeries` than `async.each` fwiw. `async.each` will fire off the http requests (or whatever) as fast as it can, i.e. concurrently; `asynSeries` (or the "serial" solution here) will do one after the next. Unless I'm Mistaken. – Stop Slandering Monica Cellio Oct 09 '14 at 15:42
  • 1
    @sequoiamcdowell, good catch. I've renamed the custom `asyncEach` to `asyncForEach` to imply that it does serial processing like `arr.forEach`. I've also updated the `async.each` to `async.eachSeries`. Thanks ! – Mulan Oct 09 '14 at 17:14
  • "No hacks necessary... Use this library" - I don't follow that logic, on the other hand NodeJS ships with promises which would make this trivial. – Benjamin Gruenbaum Oct 09 '14 at 22:09
  • @BenjaminGruenbaum, Using a lib doesn't mean "hack". I also provided a "roll your own" solution. – Mulan Oct 10 '14 at 02:14
  • @BenjaminGruenbaum I did not know node shipped with promises & my brand new install doesn't seem to have them. What's the universal, no-hack way to access native promises in node.js today? – Stop Slandering Monica Cellio Oct 10 '14 at 17:35
  • @sequoiamcdowell `$ echo "Promise" | node` should give you `ReferenceError: Promise is not defined`. This is a feature. Just add `bluebird` and all of your problems go away. – Mulan Oct 10 '14 at 17:42
  • @sequoiamcdowell what version of Node are you using? It's in since about 0.11.13 – Benjamin Gruenbaum Nov 06 '14 at 16:10
  • 1
    @BenjaminGruenbaum So when you say node "ships with" Promises you mean "if you use the [very latest](https://github.com/joyent/node/blob/7ca5af87a0662a1f8dca2391ae90a6342592302b/ChangeLog) **unstable** release, and make sure to run it with the `--harmony` switch to enable experimental features, [node ships with promises]"? This is a very peculiar definition of "no hacks" and of "ships with" :) Incidentally it's funny you should say "since about" since the release you've named is *the very latest* tagged unstable release. – Stop Slandering Monica Cellio Nov 06 '14 at 20:44
10

HACK-FREE SOLUTION

Promises to be included in next JavaScript version

The popular Promise libraries give you an .all() method for this exact use case (waiting for a bunch of async calls to complete, then doing something else). It's the perfect match for your scenario

Bluebird also has .map(), which can take an array of values and use it to start a Promise chain.

Here is an example using Bluebird .map():

var Promise = require('bluebird');
var request = Promise.promisifyAll(require('request'));

function processAllFeeds(feedsToFetch) {    
    return Promise.map(feedsToFetch, function(feed){ 
        // I renamed your 'feed' fn to 'processFeed'
        return processFeed(feed) 
    })
    .then(function(articles){
        // 'articles' is now an array w/ results of all 'processFeed' calls
        // do something with all the results...
    })
    .catch(function(e){
        // feed server was down, etc
    })
}

function processFeed(feed) { 
    // use the promisified version of 'get'
    return request.getAsync(feed.url)... 
}

Notice also that you don't need to use closure here to accumulate the results.

The Bluebird API Docs are really well written too, with lots of examples, so it makes it easier to pick up.

Once I learned Promise pattern, it made life so much easier. I can't recommend it enough.

Also, here is a great article about different approaches to dealing with async functions using promises, the async module, and others

Hope this helps!

aarosil
  • 4,848
  • 3
  • 27
  • 41
1

using a copy of the list of url as a queue to keep track of arrivals makes it simple: (all changes commented)

var q=feedsToFetch.slice(); // dupe to censor upon url arrival (to track progress)

feedsToFetch.forEach(function (feedUri)
{   
        feed(feedUri, function(err, feedArticles) 
        {
            if (err)
            {
                throw err;
            }
            else
            {
                articles = articles.concat(feedArticles);
            }
            q.splice(q.indexOf(feedUri),1); //remove this url from list
            if(!q.length) done(); // if all urls have been removed, fire needy code
        });
 });

function done(){
  // Code I want to run once all feedUris have been visited

}

in the end, this is not that much "dirtier" than promises, and provides you chance to reload un-finished urls (a counter alone wont tell you which one(s) failed). for this simple parallel download task, it's actually going to add more code to your project implement Promises than a simple queue, and Promise.all() is not in the most intuitive place to stumble across. Once you get into sub-sub-queries, or want better error handling than a trainwreck, i strongly recommend using Promises, but you don't need a rocket launcher to kill a squirrel...

dandavis
  • 16,370
  • 5
  • 40
  • 36