4

I am planning on running a large number of queries on firebase which could grow to the order of a few hundred thousand to even millions. I've been using Promise.all() to resolve most of my queries but as the requests grow Promise.all() seems to just stop running at a random number.Ive looked into using Promise.map() but Im not sure if the concurrency will solve the problem. Thank you for your help.

Below is a simplified example as you can see it appears to just time out without throwing any error:

var promises = [];
var i = 0;
for(var x = 0; x<1000000; x++){
 const promise = new Promise((resolve, reject) => {
  setTimeout(() => {
    i += 1;
    resolve(i);
  }, 10);
});
  promises.push(promise);
}


Promise.all(promises).then((value) => {
  console.log(value)
}).catch((error) => {
  console.log("Error making js node work:" + error);
})
Doug Stevenson
  • 297,357
  • 32
  • 422
  • 441
  • 1
    You probably shouldn't try to initialize so many Promises at once. Use a rate limiter to avoid more than N ongoing Promises at once, maybe? – CertainPerformance Dec 28 '18 at 20:59
  • Your only limitation here is a physical one, you can't just fire up millions of requests unless you are confident the hardware will be able to cope with it. – James Dec 28 '18 at 21:03
  • FYI - Another concern you might face may be the cloud function timing out after a certain period of time... Check out [this question](https://stackoverflow.com/questions/43353687/set-timeout-for-cloud-functions-for-firebase-does-not-persist-in-the-console-is) for instructions to edit that function's timeout. (*or if the link goes down, summary is to visit https://console.cloud.google.com/functions/list `select function` > `test function` > `edit` > `timeout`*) – JeremyW Dec 28 '18 at 21:04
  • Wouldn't the above code will cause call stack range exception as the number of promise functions grow higher? – jenil christo Dec 28 '18 at 21:04
  • @CertainPerformance Thank you for your comments, would it be possible to deallocate promises from the array of promises once they are fulfilled to limit unused space? – TheRedCamaro3.0 3.0 Dec 28 '18 at 21:06
  • Which `Promise.all`? Have you tried using [Bluebird](http://bluebirdjs.com/docs/api/promise.all.html)? Have you tried splitting this up into *N* promises that are proportionately *1/N* the size of your original array? When things stop it's usually because of some kind of error, so se if you can coax one out of it. The aggressive use of a timer could be one issue, try without that, and see if this is the most minimal failure case you can create. It's also worth testing with the absolute latest version of Node.js to be sure this isn't a bug. – tadman Dec 28 '18 at 21:10
  • @tadman Thank you for your comment. I tried using blue birds promise.map() with concurrency:1000 and it still took about 10 seconds to process an array of promises about 100,000 large. I tried raising the concurrency higher but encountered a "Maximum call stack size exceeded" error. Any suggestions on how to proceed? Thank you – TheRedCamaro3.0 3.0 Dec 31 '18 at 06:29
  • There's limits on how many things you can process concurrently. If you need to do a whole bunch either look at a multi-process approach or consider using [Web Workers](https://developer.mozilla.org/en-US/docs/Web/API/Web_Workers_API) to break this out more. You could also approach this using streams and parallel workers instead of slamming everything into promises. – tadman Jan 01 '19 at 22:24

1 Answers1

17

When I need to do something like this, I usually divide the queries into batches. The batches run one-by-one, but the queries in each batch run in parallel. Here's what that might look like.

const _ = require('lodash');

async function runAllQueries(queries) {
  const batches = _.chunk(queries, BATCH_SIZE);
  const results = [];
  while (batches.length) {
    const batch = batches.shift();
    const result = await Promises.all(batch.map(runQuery));
    results.push(result)
  }
  return _.flatten(results);
}

What you see here is similar to a map-reduce. That said, if you're running a large number of queries in a single node (e.g., a single process or virtual machine), you might consider distributing the queries across multiple nodes. If the number of queries is very large and the order in which the queries are processed is not important, this is probably a no-brainer. You should also be sure that the downstream system (i.e., the one you're querying) can handle the load you throw at it.

Brandon
  • 891
  • 8
  • 11
  • Thank you so much for your answer I have been playing around with it and it really seems to work well for 1,000,000 promises at 10,000,000 it fails. I know this game could go on forever but what is the best approach to breaking down the array. – TheRedCamaro3.0 3.0 Dec 30 '18 at 01:32
  • 1
    @therRedComaro, try using npm botleneck to progress the queries in an orderly fashion and not overwhelm the server – platinums Dec 16 '19 at 14:45