0

I expect Promise.all() to run the code for each passed item in parallel. Unfortunately, there is a "pause" after "MAPPING DATA" is done logging all items to the console and before "CHECKED DB" starts logging the progress made for all items.

It seems that instead of const snap = await global.db.collection("Prompts").doc(id).get(); being processed in parallel, they all need to finish before the rest of the code can continue executing.

Why is that the case considering I am also awaiting global.db.collection("collection").doc(id).set(data); but that line doesn't seem to "block" the process (console.log("WRITE FOR: "+id)is continuously being executed) ?

How would I execute in parallel those requests while keeping the order in which they are made for each data item ? (first check DB, then write if needed)

N.B.: Testing with 10K docs to write.

async function parallelIndividualWrites(datas) {

try {
  let mapped = 1;
  let dbcheck = 1;
  let counter = 1;
  await Promise.all(datas.map(async (data) =>  {
       // ...
    console.log("MAPPING DATA: "+(mapped++)+"/"+datas.length);
    const snap = await global.db.collection("Prompts").doc(id).get();
    console.log("CHECKED DB: "+(dbcheck++)+"/"+datas.length);
    if (snap.data()) {
        console.log("NO WRITE FOR: "+id);
        console.log("COUNT: "+counter++);
    }
    else {
        await global.db.collection("collection").doc(id).set(data);
        console.log("WRITE FOR: "+id)
        console.log("COUNT: "+counter++);
    }
  }));
}
catch(err) {
  console.log("WRITING TO DB ERROR: "+err);
}   

}
TheProgrammer
  • 1,409
  • 4
  • 24
  • 53
  • 1
    This is the same as: https://stackoverflow.com/questions/19120213/parallelizing-tasks-in-node-js Note that you should use these now: https://nodejs.org/api/worker_threads.html The summary is that JavaScript is single threaded, and therefore cannot do parallel tasks. – Michael Coxon Oct 17 '22 at 10:33
  • 2
    *"I expect Promise.all() to run the code for each passed item in parallel."* `Promise.all` doesn't "run the code" **at all**. Your functions are being run by `map`. They each run until the first `await`, then wait until the promise they're awaiting is settled. When it's settled, they'll be queued to be resumed. There is still only one thread involved (on the JavaScript side), it's just that that thread will service the jobs for the parts of the code between `await` and the next `await` (or `return`, error, or end of function). All that `Promise.all` does is wait for all the promises... – T.J. Crowder Oct 17 '22 at 10:39
  • ...to be fulfilled (or for one of them to be rejected). – T.J. Crowder Oct 17 '22 at 10:40
  • 1
    `N.B.: Testing with 10K docs to write.` On a side note, using `Promise.all` here is not a good idea. Something like this really wants some form of concurrency, otherwise you can hit thrashing issues, even if firebase has some form of stacking for this, node & firebase will certainly consume more memory. You might want to look into something like -> https://www.npmjs.com/package/p-map – Keith Oct 17 '22 at 10:49
  • Note the marked Dupe was irrelevant to this question, so re-opened, worker threads etc is not the solution to something that is already async,. – Keith Oct 17 '22 at 10:57
  • @Keith Thank you :) Concerning Promise.all, used it as it was used in https://stackoverflow.com/questions/58897274/what-is-the-fastest-way-to-write-a-lot-of-documents-to-firestore for "Parallel individual write operations" – TheProgrammer Oct 17 '22 at 11:50
  • @T.J.Crowder Thank you :) How would I solve my problem here then ? (perhaps I should rework the title of the question (EDIT: done)) – TheProgrammer Oct 17 '22 at 11:53
  • @Keith - Strongly disagree, the linked target is how you **actually** do these things in parallel as the original question asked, rather than interleaved. It also explains why the OP sees what they're seeing above. – T.J. Crowder Oct 17 '22 at 11:59
  • @TheProgrammer Yes, there are lots of answers on SO that throw the `Promise.all` answers, without any warnings of concurrency. But if you want Node to scale, it's certainly something that shouldn't be ignored. Of course, don't take my word for it, run your 10,000 requests using `Promise.all`, and then try doing the same using something that handles concurrency, maybe try 100, and then check the resource usage. – Keith Oct 17 '22 at 12:02
  • @T.J.Crowder Have to disagree here, even 10,000 empty promises take resources, what if its a million,. etc. Using concurrency would mean there are only `concurrency` count promises floating arround. Also what if there is an error 1000 records in, out of 10,000, using concurrency means those remaining 9,000 requests are not stacked up. – Keith Oct 17 '22 at 12:05
  • @Keith - And if the system has 8 cores, that's 8x as much work getting done (allowing for good I/O support). But that's not really the point, the point is that the linked question's answers explained what was going on that the OP didn't understand, and explained how you could actually achieve parallelism (within the bounds of the possible). Now, we have a not-very-good question attracting poor answers, which isn't an improvement. I won't get into more back-and-forth on it, it's not worth your time or mine. :-) But I thought it was worth saying. – T.J. Crowder Oct 17 '22 at 12:10
  • (Didn't see your thing about 9,000 requests stacked up. Not sure I see what you mean there, but it probably also doesn't matter. :-) ) – T.J. Crowder Oct 17 '22 at 12:13
  • @T.J.Crowder What's the number of cores got to do with anything, both are going to use them to the full, using concurrency certainly doesn't stop that unless you set concurrency to 1, and of course that's not what I'd suggest . And by the way, my answer started with `On a side note`, so yes, slightly of topic and deliberately not meant to be on point. – Keith Oct 17 '22 at 12:13
  • @Keith - I'm not talking about `p-map` (hadn't even seen that comment). I'm talking about using promises as the OP did above vs. worker threads and your reason for reopening the question. *"What's the number of cores got to do with anything, both are going to use them to the full"* What makes you think that? Node.js runs the JavaScript code on a single thread, which can only run on one core (at any given time). Worker threads allow JavaScript code to run on multiple cores simultaneously. You're a smart guy so I think I'm misunderstanding you in some way. – T.J. Crowder Oct 17 '22 at 12:17
  • @T.J.Crowder I know Javascript doesn't run multi-threaded, but I also assumed you knew that most `async` functions in Node already get passed / processed in some form of threads. Firebird is of course not Javascript, the only overhead here in the JS land is whatever IPC protocol Firebird is using, and of course that's a minor part. Yes, there might be some slight improvement, but very little, because the only thing it's going to use the extra cores for is the IPC layer. And then there is of course the extra IPC requirements required for Node, (`sendmessage`)` etc, that then needs handling. – Keith Oct 17 '22 at 12:27
  • @Keith - Yes, of course I am -- **sometimes**. They may or may not be actually handled on other threads; it's entirely possible they're just waiting on I/O managed by Node on its primary I/O handling thread. But I think at **least** half of the back-and-forth above is the result of a misunderstanding, I was saying I strongly disagreed with your claim the linked question was irrelevant, nothing to do with your `p-map` comment. (I agree with you that `p-map` or similar could be useful for avoiding having everything being half-completed.) – T.J. Crowder Oct 17 '22 at 12:31
  • @T.J.Crowder Oh, right, yeah maybe misunderstanding :). Still think web workers is not a great solution though, and from what I remember that's what the linked answer was inferring. Unless Firebird is so fast and the bottleneck is the Javascript part, but from my experience the speed of the DB is the bottleneck, not JS. Not saying web workers don't have there uses, but be very careful in Node assuming it's a silver bullet for performance gains especially for API's that are intrinsically async. If the OP was doing some very heavy JS processing on the results, then of course things change. – Keith Oct 17 '22 at 12:59
  • "Now, we have a not-very-good question" Doing my best with very limited time, sorry :/ @T.J.Crowder Thank you for your answers in the comments though, I really appreciate it :) – TheProgrammer Oct 18 '22 at 10:40
  • @TheProgrammer I think it's nicely written, and brings to the table some legitimate concerns, TJ is not normally this abrupt, maybe had a bad day. Anyway back to your question, the reason your getting the order your getting is that your sending 10,000 promises to be processed, but your not going to get any results until your `map` has finished, as map's are blocking, it's another reasons using a promise based map with concurrency would be much nicer. Again, I will say people on SO throw the `Promise.all` into scenarios like yours, and it really is not the best tool for the job. – Keith Oct 18 '22 at 10:53
  • @Keith Thank you for your answers in the comments :) Would you like to write them as a formal answer to the question so it can be accepted ? – TheProgrammer Oct 20 '22 at 08:29

1 Answers1

-1

Try to put const snap = await global.db.collection("Prompts").doc(id).get(); in async function and call it here as await fn()

Also for await global.db.collection("collection").doc(id).set(data);