2

I know there are several posts about this, but according to those I've found, this should work correctly.

I want to make an http request in a loop and I do not want the loop to iterate until the request callback has been fired. I'm using the async library like so:

const async = require("async");
const request = require("request");

let data = [
    "Larry",
    "Curly",
    "Moe"
];

async.forEachOf(data, (result, idx, callback) => {
    console.log("Loop iterated", idx);
    let fullUri = "https://jsonplaceholder.typicode.com/posts";
    request({
        url: fullUri
    }, 
    (err, res, body) => {
        console.log("Request callback fired...");
        if (err || res.statusCode !== 200) return callback(err);
        console.log(result);
        callback();
    });
});

What I see is:

Loop iterated 0
Loop iterated 1
Loop iterated 2
Request callback fired...
Curly
Request callback fired...
Larry
Request callback fired...
Moe

What I need to see is:

Loop iterated 0
Request callback fired...
Curly
Loop iterated 1
Request callback fired...
Larry
Loop iterated 2
Request callback fired...
Moe

Also, if there's a built-in way to do the same thing (async/await? Promise?) and the async library could be removed, that'd be even better.

I've seen some examples of recursion out there that are clever, but when I put this to use in a much more complex situation (e.g. multiple request calls per-loop, etc.) I feel like that approach is hard to follow, and isn't as readable.

Tsar Bomba
  • 1,047
  • 6
  • 29
  • 52

2 Answers2

6

You can ditch async altogether and go for async/await quite easily.

Promisify your request and use async/await

Just turn request into a Promise so you can await on it.

Better yet just use request-promise-native that already wraps request using native Promises.

Serial example

From then on it's a slam dunk with async/await:

const rp = require('request-promise-native')

const users = [1, 2, 3, 4]
const results = []

for (const idUser of users) {
  const result = await rp('http://foo.com/users/' + idUser)

  results.push(result)
}

Parallel example

Now, the problem with the above solution is that it's slow - the requests run serially. That's not ideal most of the time.

If you don't need the result of the previous request for the next request, just go ahead and do a Promise.all to fire parallel requests.

const users = [1, 2, 3, 4]

const pendingPromises = []
for (const idUser of users) {
  // Here we won't `await` on *each and every* request.
  // We'll just prepare it and push it into an Array
  pendingPromises.push(rp('http://foo.com/users/' + idUser))
}

// Then we `await` on a a `Promise.all` of those requests
// which will fire all the prepared promises *simultaneously*, 
// and resolve when all have been completed
const results = await Promise.all(pendingPromises)

Error handling

Error handling in async/await is provided by plain-old try..catch blocks, which I've omitted for brevity.

nicholaswmin
  • 21,686
  • 15
  • 91
  • 167
  • Thanks, Nicholas. Tried tagging you and can't. This routine will be part of a very large CSV upload. How many pending promises is too many? What if I have 100,000 records? Or...millions? – Tsar Bomba Dec 27 '17 at 23:16
  • Uh oh - You might run into call stack limit issues with that many functions being created - the same problem would be exhibited with your callback-style code though. Also what about memory? Do you intend to load the entire CSV in-memory? Does it fit? You might need to look into [streams](https://nodejs.org/api/stream.html). What exactly do you need to do? – nicholaswmin Dec 27 '17 at 23:18
  • I figured as much. Testing with small files now but anticipate much larger. I'm already streaming the file into an API endpoint via multer. I then need to process each row, split it into several records, which are then inserted into tables scattered across a few other microservice endpoints...hence the "request" calls. – Tsar Bomba Dec 27 '17 at 23:24
  • Ok great. Can you open another question for this so I can help you out there? – nicholaswmin Dec 27 '17 at 23:25
  • Done. Many thanks! It can be found here: https://stackoverflow.com/questions/47999919/node-microservices-processing-large-csv-uploads – Tsar Bomba Dec 27 '17 at 23:48
0

If you have many (thousands) of urls to process it's best to define a batch size and recursively call the process function to process one batch.

It's also best to limit amount of active connections, you can use this to throttle active connections or connections within a certain time (only 5 per second).

Last but not least; if you use Promise.all you want to make sure not all successes are lost when one promise rejects. You can catch rejected requests and return a Fail type object so it'll then resolve with this Fail type.

The code would look something like this:

const async = require("async");
//lib comes from: https://github.com/amsterdamharu/lib/blob/master/src/index.js
const lib = require("lib");
const request = require("request");

const Fail = function(reason){this.reason=reason;};
const isFail = o=>(o&&o.constructor)===Fail;
const requestAsPromise = fullUri =>
  new Promise(
    (resolve,reject)=>
      request({
        url: fullUri
      }, 
      (err, res, body) => {
        console.log("Request callback fired...");
        if (err || res.statusCode !== 200) reject(err);
        console.log("Success:",fullUri);
        resolve([res,body]);
      })
  )
const process = 
  handleBatchResult =>
  batchSize =>
  maxFunction =>
  urls =>
    Promise.all(
      urls.slice(0,batchSize)
      .map(
        url=>
          maxFunction(requestAsPromise)(url)
          .catch(err=>new Fail([err,url]))//catch reject and resolve with fail object
      )
    )
    .then(handleBatch)
    .catch(panic=>console.error(panic))
    .then(//recursively call itself with next batch
      _=>
        process(handleBatchResult)(batchSize)(maxFunction)(urls.slice(batchSize))
    );

const handleBatch = results =>{//this will handle results of a batch
  //maybe write successes to file but certainly write failed
  //  you can retry later     
  const successes = results.filter(result=>!isFail(result));
  //failed are the requests that failed
  const failed = results.filter(isFail);
  //To get the failed urls you can do
  const failedUrls = failed.map(([error,url])=>url);
};

const per_batch_1000_max_10_active = 
  process (handleBatch) (1000) (lib.throttle(10));

//start the process
per_batch_1000_max_10_active(largeArrayOfUrls)
.then(
  result=>console.log("Process done")
  ,err=>console.error("This should not happen:".err)
);

In your handleBatchResult you can store failed requests to a file to try later const [error,uri] = failedResultItem; you should give up if a high amount of requests are failing.

After handleBatchResult there is a .catch, that is your panic mode, it should not fail there so I'd advice to pipe errors to a file (linux).

HMR
  • 37,593
  • 24
  • 91
  • 160