5

I'm trying to find a good pattern to execute a bunch of parallel tasks.

Let me define some task to exemplify. Tasks a, b, c, d, e, f, g execute as a(function(er, ra){//task a returned, ra is result}), so do b to g

There are also some tasks that should be execute after some task is done, let's call them ab, bc, abc, bd, bcd, af, fg, means when a and b has returned ab(ra, rb) should be executed at once, and when b and c returned, bc(rb, rc) should be executed at once, and if a, b, c all returned, abc(ra, rb, rc) should be executed.

For the simplest case, if there is only a and b, I can do something like this:

(function(cb){
    var count = 2, _ra, _rb;
    function update(){if(--count == 0) cb(null, _ra, _rb)}
    a(function(er, ra){_ra = ra; update()});
    b(function(er, ra){_rb = rb; update()});
})(function(er, ra, rb){
    ab(ra, rb);
});

As you can see, a and b execute in parallel, and when both are done, ab(ra, rb) execute.

But how can I do more things for a lot of parallel tasks?

Dan Dascalescu
  • 143,271
  • 52
  • 317
  • 404
guilin 桂林
  • 17,050
  • 29
  • 92
  • 146

6 Answers6

14

What you actually want is a deferred pattern though like futures.

function defer(f) {
    // create a promise.
    var promise = Futures.promise();
    f(function(err, data) {
        if (err) {
            // break it
            promise.smash(err);
        } else {
            // fulfill it
            promise.fulfill(data);
        }
    });
    return promise;
}
var da = defer(a), db = defer(b), dc = defer(c), dd = defer(d), de = defer(e), df = defer(f), dg = defer(g);

// when a and b are fulfilled then call ab
// ab takes one parameter [ra, rb]
Futures.join(da, db).when(ab);
Futures.join(db, dc).when(bc);
// abc takes one parameter [ra, rb, rc]
Futures.join(da, db, dc).when(abc);
Futures.join(db, dd).when(bd);
Futures.join(db, dc, dd).when(bcd);
Futures.join(da, df).when(af);
// where's e ?
Futures.join(df,dg).when(fg);
Futures.join(da,db,dc,dd,de,df,dg).fail(function() {
    console.log(":(");
});
Raynos
  • 166,823
  • 56
  • 351
  • 396
  • 6
    It's almost like music it's so beautiful – jcolebrand May 13 '11 at 17:31
  • Is this genuinely handling those tasks in parallel, when node.js is single threaded running in a single process? Or is it pushing the tasks to the end of the event queue, to be handled later, when the event queue has nothing else to do? – Ant Kutschera May 16 '11 at 20:19
  • @AntKutschera there is no multi threading. It's simply saying run a-g in parallel (there asynchronous handled by a different process in a different thread) and when any pair finishes run this function – Raynos May 16 '11 at 20:27
  • who starts that child process? The Futures library? How? – Ant Kutschera May 16 '11 at 20:41
  • @AntKutschera I assumed `a,b,c,d, etc` all either start child processes or do asynchronous IO through a socket or stream. Node does a minimal amount of lifting. Futures doesn't. Futures just says when both da and db return run ab. – Raynos May 16 '11 at 21:04
8

You should check out Step ( https://github.com/creationix/step ). It's just over a hundred lines of code, so you can read the whole thing if needed.

My preferred pattern looks something like this:


function doABunchOfCrazyAsyncStuff() {
  Step (
    function stepA() {
      a(arg1, arg2, arg3, this); // this is the callback, defined by Step
    }
    ,function stepB(err, data) {
      if(err) throw err; // causes error to percolate to the next step, all the way to the end.  same as calling "this(err, null); return;"
      b(data, arg2, arg3, this);
    }
    ,function stepC(err, data) {
      if(err) throw err;
      c(data, arg2, arg3, this);
    }
    ,function stepDEF(err, data) {
      if(err) throw err;
      d(data, this.parallel());
      e(data, this.parallel());
      f(data, this.parallel());
    }
    ,function stepGGG(err, dataD, dataE, dataF) {
      if(err) throw err;
      var combined = magick(dataD, dataE, dataF);
      var group = this.group();  // group() is how you get Step to merge multiple results into an array
      _.map(combined, function (element) {
        g(element, group()); 
      });
    }
    ,function stepPostprocess(err, results) {
      if(err) throw err;
      var processed = _.map(results, magick);
      return processed; // return is a convenient alternative to calling "this(null, result)"
    }
    ,cb // finally, the callback gets (err, result) from the previous function, and we are done
  );
}

Notes

  • My example also uses the underscore library, "the tie to match JQuery's tux": http://documentcloud.github.com/underscore/
  • Naming each step function stepXXXXX is a good habit so that the stack traces are clear and readable.
  • Step lets you make powerful and elegant combinations of serial and parallel execution. These patterns are straightforward and comprehensible. If you need anything more complex like "when 3 of 5 of these methods completes, go to the next step", seriously RETHINK your design. do you really need such a complex pattern? (maybe you are waiting for a quorum set). Such a complex pattern deserves a function of it's own.
Dave Dopson
  • 41,600
  • 19
  • 95
  • 85
  • is group tasks are parallel or serial – Ganesh Kumar Mar 10 '12 at 06:05
  • There are a couple downsides to this over async, namely if you want to call a utility method, such as with waterfall, you'd need to have a wrapper to handle the err. ex: collection.find.bind(...), where you can use function binders to methods in your chain... this is really useful, and doesn't work with step. – Tracker1 Jul 25 '13 at 17:21
2

Try to look at step module and this article.

yojimbo87
  • 65,684
  • 25
  • 123
  • 131
  • That should be a compliment. Explaining how to use step is an answer. – Raynos May 13 '11 at 15:54
  • 2
    @Raynos it took me a couple times reading that to realize I think you meant 'comment', but don't change it because that's pretty funny – MetaGuru Jan 29 '13 at 02:30
1

yeah, look at flow control module, like step, chain, or flow ~ and i think there is something like that in underscore.js too

T1B0
  • 11
  • 1
0

nimble is another good choice.

Khanh Nguyen
  • 11,112
  • 10
  • 52
  • 65
0

A very simple barrier just for this: https://github.com/berb/node-barrierpoints

b_erb
  • 20,932
  • 8
  • 55
  • 64