43

I have 2 functions that I'm running asynchronously. I'd like to write them using waterfall model. The thing is, I don't know how..

Here is my code :

var fs = require('fs');
function updateJson(ticker, value) {
  //var stocksJson = JSON.parse(fs.readFileSync("stocktest.json"));
  fs.readFile('stocktest.json', function(error, file) {
    var stocksJson =  JSON.parse(file);

    if (stocksJson[ticker]!=null) {
      console.log(ticker+" price : " + stocksJson[ticker].price);
      console.log("changing the value...")
      stocksJson[ticker].price =  value;

      console.log("Price after the change has been made -- " + stocksJson[ticker].price);
      console.log("printing the the Json.stringify")
      console.log(JSON.stringify(stocksJson, null, 4));
      fs.writeFile('stocktest.json', JSON.stringify(stocksJson, null, 4), function(err) {  
        if(!err) {
          console.log("File successfully written");
        }
        if (err) {
          console.error(err);
        }
      }); //end of writeFile
    } else {
      console.log(ticker + " doesn't exist on the json");
    }
  });
} // end of updateJson 

Any idea how can I write it using waterfall, so i'll be able to control this? Please write me some examples because I'm new to node.js

Talha Awan
  • 4,573
  • 4
  • 25
  • 40
Alex Brodov
  • 3,365
  • 18
  • 43
  • 66
  • Basically you want to know where in your code you have to call your second function to make it execute only when the first one has ended the job to write to file, and maybe repeat this procedure on another asynchronous function, right? – MastErAldo Sep 06 '14 at 22:22

3 Answers3

63

First identify the steps and write them as asynchronous functions (taking a callback argument)

  • read the file

    function readFile(readFileCallback) {
        fs.readFile('stocktest.json', function (error, file) {
            if (error) {
                readFileCallback(error);
            } else {
                readFileCallback(null, file);
            }
        });
    }
    
  • process the file (I removed most of the console.log in the examples)

    function processFile(file, processFileCallback) {
        var stocksJson = JSON.parse(file);
        if (stocksJson[ticker] != null) {
            stocksJson[ticker].price = value;
            fs.writeFile('stocktest.json', JSON.stringify(stocksJson, null, 4), function (error) {
                if (err) {
                    processFileCallback(error);
                } else {
                    console.log("File successfully written");
                    processFileCallback(null);
                }
            });
        }
        else {
            console.log(ticker + " doesn't exist on the json");
            processFileCallback(null); //callback should always be called once (and only one time)
        }
    }
    

Note that I did no specific error handling here, I'll take benefit of async.waterfall to centralize error handling at the same place.

Also be careful that if you have (if/else/switch/...) branches in an asynchronous function, it always call the callback one (and only one) time.

Plug everything with async.waterfall

async.waterfall([
    readFile,
    processFile
], function (error) {
    if (error) {
        //handle readFile error or processFile error here
    }
});

Clean example

The previous code was excessively verbose to make the explanations clearer. Here is a full cleaned example:

async.waterfall([
    function readFile(readFileCallback) {
        fs.readFile('stocktest.json', readFileCallback);
    },
    function processFile(file, processFileCallback) {
        var stocksJson = JSON.parse(file);
        if (stocksJson[ticker] != null) {
            stocksJson[ticker].price = value;
            fs.writeFile('stocktest.json', JSON.stringify(stocksJson, null, 4), function (error) {
                if (!err) {
                    console.log("File successfully written");
                }
                processFileCallback(err);
            });
        }
        else {
            console.log(ticker + " doesn't exist on the json");
            processFileCallback(null);
        }
    }
], function (error) {
    if (error) {
        //handle readFile error or processFile error here
    }
});

I left the function names because it helps readability and helps debugging with tools like chrome debugger.

If you use underscore (on npm), you can also replace the first function with _.partial(fs.readFile, 'stocktest.json')

Volune
  • 4,324
  • 22
  • 23
  • So basically the `processFile` is getting the `file` from the callback of the `fs.readFile` right? Should i declare first of all all the functions then use them in waterfall? You gave 2 examples of using the waterfall.. – Alex Brodov Sep 06 '14 at 22:52
  • Right. (If there is an error in readFile, _waterfall_ will automatically skip processFile.) And pick the way to write the code that you're most comfortable with, first or second. – Volune Sep 06 '14 at 22:57
  • can i wrap the whole `waterfall function` into a `for loop`? Is it going to complete all the operations inside before the next iteration of the loop? – Alex Brodov Sep 06 '14 at 23:09
  • 1
    No, you should use something like [async.eachSeries](https://github.com/caolan/async#eachSeries). Put your use of waterfall in an asynchronous function (like `updateJsonAsync(options,callback)`) and call it with something like `async.eachSeries( [{ticker:...,value:...},...], updateJsonAsync, function(err){/*handle error*/} )` – Volune Sep 06 '14 at 23:18
  • so i can put the waterfall function into a wrapper function in this way: `updateJsonAsync(ticker, value, callback)` right? The 2nd argument that the waterfall get is a callback of my wrapper function is it right? – Alex Brodov Sep 06 '14 at 23:33
  • If you want to use it with _eachSeries_, the _updateJsonAsync_ function must be like `iterator(item, callback)` ([see documentation](https://github.com/caolan/async#eacharr-iterator-callback)), so only one argument before the callback. And right, you can use the callback of your wrapper function as 2nd argument of the waterfall. – Volune Sep 06 '14 at 23:38
  • I know this is a few months old but I'm super grateful for your explanation of async waterfall. I'm using it for a class project! – Keith Yong Feb 21 '15 at 15:30
  • Death to my callback hell! Then put a stake through it, and sprinkle some holy water on it! Thanks man. ^_^ – Anthony Jun 10 '15 at 21:22
  • This is a VERY succinct answer. Examples are clear and concise. Kudos on you! Maybe YOU should write the docs!!!! :-) – james emanon Jan 12 '16 at 01:39
  • Great answer! Bonus for mentioning the underscore library's use of partial function evalution. However, I also wanted to mention alternate ways of passing arguments to the first function in case you don't want to include an additional library such as underscore or lodash for this purpose alone: *1. Use the ES5+ bind() method as follows:* `fs.readFile.bind(null, 'stocktest.json')` *2. Use async.apply() as follows:* `async.apply(fs.readFile, 'stocktest.json')` – Alan C. S. Aug 20 '16 at 19:36
14

First and foremost, make sure you read the documentation regarding async.waterfall.

Now, there are couple key parts about the waterfall control flow:

  1. The control flow is specified by an array of functions for invocation as the first argument, and a "complete" callback when the flow is finished as the second argument.
  2. The array of functions are invoked in series (as opposed to parallel).
  3. If an error (usually named err) is encountered at any operation in the flow array, it will short-circuit and immediately invoke the "complete"/"finish"/"done" callback.
  4. Arguments from the previously executed function are applied to the next function in the control flow, in order, and an "intermediate" callback is supplied as the last argument. Note: The first function only has this "intermediate" callback, and the "complete" callback will have the arguments of the last invoked function in the control flow (with consideration to any errors) but with an err argument prepended instead of an "intermediate" callback that is appended.
  5. The callbacks for each individual operation (I call this cbAsync in my examples) should be invoked when you're ready to move on: The first parameter will be an error, if any, and the second (third, fourth... etc.) parameter will be any data you want to pass to the subsequent operation.

The first goal is to get your code working almost verbatim alongside the introduction of async.waterfall. I decided to remove all your console.log statements and simplified your error handling. Here is the first iteration (untested code):

var fs = require('fs'),
    async = require('async');

function updateJson(ticker,value) {
    async.waterfall([ // the series operation list of `async.waterfall`
        // waterfall operation 1, invoke cbAsync when done
        function getTicker(cbAsync) {
            fs.readFile('stocktest.json',function(err,file) {
                if ( err ) {
                    // if there was an error, let async know and bail
                    cbAsync(err);
                    return; // bail
                }
                var stocksJson = JSON.parse(file);
                if ( stocksJson[ticker] === null ) {
                    // if we don't have the ticker, let "complete" know and bail
                    cbAsync(new Error('Missing ticker property in JSON.'));
                    return; // bail
                }
                stocksJson[ticker] = value;
                // err = null (no error), jsonString = JSON.stringify(...)
                cbAsync(null,JSON.stringify(stocksJson,null,4));    
            });
        },
        function writeTicker(jsonString,cbAsync) {
            fs.writeFile('stocktest.json',jsonString,function(err) {
                cbAsync(err); // err will be null if the operation was successful
            });
        }
    ],function asyncComplete(err) { // the "complete" callback of `async.waterfall`
        if ( err ) { // there was an error with either `getTicker` or `writeTicker`
            console.warn('Error updating stock ticker JSON.',err);
        } else {
            console.info('Successfully completed operation.');
        }
    });
}

The second iteration divides up the operation flow a bit more. It puts it into smaller single-operation oriented chunks of code. I'm not going to comment it, it speaks for itself (again, untested):

var fs = require('fs'),
    async = require('async');

function updateJson(ticker,value,callback) { // introduced a main callback
    var stockTestFile = 'stocktest.json';
    async.waterfall([
        function getTicker(cbAsync) {
            fs.readFile(stockTestFile,function(err,file) {
                cbAsync(err,file);
            });
        },
        function parseAndPrepareStockTicker(file,cbAsync) {
            var stocksJson = JSON.parse(file);
            if ( stocksJson[ticker] === null ) {
                cbAsync(new Error('Missing ticker property in JSON.'));
                return;
            }
            stocksJson[ticker] = value;
            cbAsync(null,JSON.stringify(stocksJson,null,4));
        },
        function writeTicker(jsonString,cbAsync) {
            fs.writeFile('stocktest.json',jsonString,,function(err) {
                cbAsync(err);
            });
        }
    ],function asyncComplete(err) {
        if ( err ) {
            console.warn('Error updating stock ticker JSON.',err);
        }
        callback(err);
    });
}

The last iteration short-hands a lot of this with the use of some bind tricks to decrease the call stack and increase readability (IMO), also untested:

var fs = require('fs'),
    async = require('async');

function updateJson(ticker,value,callback) {
    var stockTestFile = 'stocktest.json';
    async.waterfall([
        fs.readFile.bind(fs,stockTestFile),
        function parseStockTicker(file,cbAsync) {
            var stocksJson = JSON.parse(file);
            if ( stocksJson[ticker] === null ) {
                cbAsync(new Error('Missing ticker property in JSON.'));
                return;
            }
            cbAsync(null,stocksJson);
        },
        function prepareStockTicker(stocksJson,cbAsync) {
            stocksJson[ticker] = value;
            cbAsync(null,JSON.stringify(stocksJson,null,4));
        },
        fs.writeFile.bind(fs,stockTestFile)
    ],function asyncComplete(err) {
        if ( err ) {
            console.warn('Error updating stock ticker JSON.',err);
        }
        callback(err);
    });
}
zamnuts
  • 9,492
  • 3
  • 39
  • 46
  • Thanks a lot now i understand it much more better ! I got another question: What if i need to perform this whole operation a lot of times let's say 5 times can i wrap it all into a `for loop` the whole `function`? Is it going to work , i mean is it going to finish all the operations inside until the second iteration of the loop? – Alex Brodov Sep 06 '14 at 23:02
  • 3
    @user3502786 you cannot use a `for` loop, if you do then multiple `async.waterfall` operations will be run in parallel. Instead use `async.eachSeries`, `async.whilst`, or `async.until`. These are equivalent to a `for` loop, but will wait until async's `callback` is invoked before moving on to the next iteration (in other words, a `for` loop that will yield). – zamnuts Sep 06 '14 at 23:10
  • @user3502786 Example: `var tickers = [{ticker:'GOOG',value:1},{ticker:'YHOO',value:2}]; async.eachSeries(tickers,updateJson,function(err){/*done*/});` but you'll have to change your `updateJson` to `function updateJson(obj,callback){var ticker = obj.ticker,value = obj.value;async.waterfall([/* flow operations go here */],callback)};` – zamnuts Sep 06 '14 at 23:18
2

Basically nodejs (and more generally javascript) functions that require some time to execute (be it for I/O or cpu processing) are typically asynchronous, so the event loop (to make it simple is a loop that continuously checks for tasks to be executed) can invoke the function right below the first one, without getting blocked for a response. If you are familiar with other languages like C or Java, you can think an asynchronous function as a function that runs on another thread (it's not necessarily true in javascript, but the programmer shouldn't care about it) and when the execution terminates this thread notifies the main one (the event loop one) that the job is done and it has the results.

As said once the first function has ended its job it must be able to notify that its job is finished and it does so invoking the callback function you pass to it. to make an example:

var callback = function(data,err)
{
   if(!err)
   {
     do something with the received data
   }
   else
     something went wrong
}


asyncFunction1(someparams, callback);

asyncFunction2(someotherparams);

the execution flow would call: asyncFunction1, asyncFunction2 and every function below until asyncFunction1 ends, then the callback function which is passed as the last parameter to asyncFunction1 is called to do something with data if no errors occurred.

So, to make 2 or more asynchronous functions execute one after another only when they ended you have to call them inside their callback functions:

function asyncTask1(data, function(result1, err)
{
   if(!err)
     asyncTask2(data, function(result2, err2)
     {
           if(!err2)
        //call maybe a third async function
           else
             console.log(err2);
     });
    else
     console.log(err);
});

result1 is the return value from asyncTask1 and result2 is the return value for asyncTask2. You can this way nest how many asynchronous functions you want.

In your case if you want another function to be called after updateJson() you must call it after this line:

console.log("File successfully written");
Paul Kearney - pk
  • 5,435
  • 26
  • 28
MastErAldo
  • 634
  • 3
  • 12
  • 29
  • Good explanation of asynchronous operations, however the question is specifically regarding [caolan's async module](https://github.com/caolan/async). I believe the OP understands how asynchronous code works seeing that they are already nesting asynchronous functions. – zamnuts Sep 06 '14 at 22:46