2

NOTE: This is a cronjob so I need to exit with a process.exit command when the processing is completed.

In this one particular case, I'm going to use some dummy code to represent the problem b/c I believe pasting the code from this script would generate more questions and cause confusion.

I'm curious about the framework choices I've made. I know there is a better way to organize the promises to create maximum parallel efficiency and, at the same time, know when to exit.

The code below may not be syntactically perfect since I'm just writing it by hand here for the purposes of showing the corner I painted myself into.


let promises = [];

promises.push(db.get_customers());

let cus_recs = await Promise.all(promises); promises = [];

cus_recs.map((cus_rec) => {

  promises.push(db.get_criteria(cus_rec));

});

let criteria_recs = await Promise.all(promises); promises = [];

cus_recs.map((cus_rec, a) => {

  let cus_criteria_recs = criteria_recs[a];

  cus_criteria_recs.map((cus_criteria_rec) => {

    ci.build_lis_match_data(cus_criteria_rec).then((lis_match_data) => {

      lis.get_matching_listings(cus_criteria_rec, lis_match_data).then((lis_recs) => {

        lis_recs.map((lis_rec) => {

          ci.build_lis_match_table(cus_criteria_rec, lis_rec).then((lis_match_table) => {

            promises.push(lis.do_the_thing(cus_rec, lis_rec));

          });

        });

      });

    });

  });

});

if (promises.length > 0) {

  await Promise.all(promises); promises = [];

}

process.exit(0);

The issue I am faced with is where to place process.exit(0); If placed as shown, node exists before all the promises are finished.

I am aware I can turn this into for loops and use async/await to just wait on each task.

I was hoping that this method would create extreme parallelism of these intensive tasks. Many of these tasks are > 0.1 secs and this will run for 600,000 (or so) records per day.

Nick
  • 466
  • 1
  • 6
  • 12
  • Just a wild guess here: the main function is not `async` and thus won't deal with your `await`. Try to wrap all of your code into something like `const myLogicAsync = async () => { /* all your code */ };` and then call this: `myLogicAsync(); process.exit(0);` – digitalbreed Aug 18 '21 at 22:34
  • The main function is async but the callback functions within are not. But I'm trying not to await everything in between. – Nick Aug 18 '21 at 22:36
  • You actually should need to explicitly exit. When the script finishes it will exit. – cam Aug 18 '21 at 22:38
  • What I am trying to say here is: the NodeJS entry into your code is not async and it will not honor your `await` like the one in the second last statement. You need to wrap it into an `async` function yourself, like in my example. – digitalbreed Aug 18 '21 at 22:38
  • 1
    `promises`, at then end of the nested `map` code, is not guaranteed to have all of the promises that may result from the `map`s within. But then, you shouldn't be using `map`, since you're not using the resulting arrays. You could use a [`for await ... of` loop](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Statements/for-await...of) instead of the nested promises... – Heretic Monkey Aug 18 '21 at 22:39
  • @HereticMonkey No, do not use a `for await … of` loop! There are no asynchronous generators anywhere here. – Bergi Aug 18 '21 at 22:52
  • "*I was hoping that this method would create extreme parallelism of these intensive tasks*" - what are these tasks actually doing? Notice that extreme parallelism might be very bad for throughput because of context switching overhead and eventual queuing around the bottlenecks. Identify what your jobs are actually limited by. – Bergi Aug 18 '21 at 22:56
  • @Bergi That does seem [to be a bee in your bonnet](https://stackoverflow.com/a/59695815/215552) doesn't it? :) – Heretic Monkey Aug 18 '21 at 22:57
  • 1
    @HereticMonkey More like ants in my pants :-) Yes, this [abuse of `for await`](https://stackoverflow.com/q/60706179/1048572) needs to be prevented before becoming too widespread – Bergi Aug 18 '21 at 23:02

1 Answers1

3

The problem is that you're doing promises.push() inside a .then() callback (actually, multiple nested ones even), which runs asynchronously. The promises.length is still 0 when your if statement runs.

Avoid using .push() with .map(), it already creates an array from the return values. So always return a promise from the callback.

let promises = [db.get_customers()];
let cus_recs = await Promise.all(promises);

promises = cus_recs.map((cus_rec) => {
  return db.get_criteria(cus_rec);
//^^^^^^
});
let criteria_recs = await Promise.all(promises);

??? = cus_recs.map((cus_rec, a) => { /*
^^^ what does this return? What *can* we return? */

  let cus_criteria_recs = criteria_recs[a];
  promises = cus_criteria_recs.map((cus_criteria_rec) => {
//^^^^^^^^ this creates an array of promises
    return ci.build_lis_match_data(cus_criteria_rec).then((lis_match_data) => {
//  ^^^^^^ (if they are returned here)

      return lis.get_matching_listings(cus_criteria_rec, lis_match_data).then((lis_recs) => {
//    ^^^^^^ need to use promise chaining
        promises = lis_recs.map((lis_rec) => {
//      ^^^^^^^^ same thing again
          return ci.build_lis_match_table(cus_criteria_rec, lis_rec).then((lis_match_table) => {^
//        ^^^^^^
            return lis.do_the_thing(cus_rec, lis_rec));
//          ^^^^^^
          });
        });
        …
      });
    });
  });
  … // now we have the `promises` array around - so we
  return Promise.all(promises)
});
… // and here again we need to deal with the ??? value

process.exit(0);

In fact I'd recommend not to even assign to a promises variable, just pass the result right into Promise.all - as that's the only reasonable thing you can do with an array of promises. So

const cus_recs = await Promise.all([
  db.get_customers()
]);

const criteria_recs = await Promise.all(cus_recs.map(cus_rec =>
  db.get_criteria(cus_rec)
));

await Promise.all(cus_recs.map((cus_rec, a) => { /*
^^^^^^^^^^^^^^^^^ this comes naturally now */

  let cus_criteria_recs = criteria_recs[a];
  return Promise.all(cus_criteria_recs.map(cus_criteria_rec =>
//^^^^^^^^^^^^^^^^^^
    ci.build_lis_match_data(cus_criteria_rec).then(lis_match_data =>  
      lis.get_matching_listings(cus_criteria_rec, lis_match_data).then(lis_recs =>
        Promise.all(lis_recs.map(lis_rec =>
//      ^^^^^^^^^^^
          ci.build_lis_match_table(cus_criteria_rec, lis_rec).then(lis_match_table =>
            lis.do_the_thing(cus_rec, lis_rec)
          )
        ))
      )
    )
  ));
}));

process.exit(0);

I would also recommend to use async/await everywhere, if you can easily replace the nested .then() calls by it:

await Promise.all(cus_recs.map((cus_rec, a) => {
  let cus_criteria_recs = criteria_recs[a];
  return Promise.all(cus_criteria_recs.map(async cus_criteria_rec => {
    const lis_match_data = await ci.build_lis_match_data(cus_criteria_rec);
    const lis_recs = await lis.get_matching_listings(cus_criteria_rec, lis_match_data);
    return Promise.all(lis_recs.map(async lis_rec => {
      const lis_match_table = await ci.build_lis_match_table(cus_criteria_rec, lis_rec);
      return lis.do_the_thing(cus_rec, lis_rec);
    }));
  }));
}));
Bergi
  • 630,263
  • 148
  • 957
  • 1,375
  • Regarding your last comment here about using async/await here, will that prevent the script from running all the callback functions sent to the .then at once? In other words, is this await on the return value of each callback function going to cause the entire script to pause while the action occurs or ...? – Nick Aug 19 '21 at 16:43
  • No, an `await` only pauses the `async` function that it is located in, i.e. each `map` callback is paused separately. The code in the last snippet works exactly the same as the code in the snippet above it. – Bergi Aug 19 '21 at 17:22