0

Have been searching everywhere today and breaking my head to the point that I really don't know anymore what to do with this.

I am trying to do something that sounds simple to me. However the asynchronous workflow breaks me up.

  1. Find each ID from crawlJobs in Firestore
  2. Get all the docs of those IDs in another collection
  3. Add the output to an array
  4. return the output of the array

Because everything is async i'm getting stuck on number 2 and 3 as number 4 executes faster than the async code is done with.

My big question, how do I handle this?

Here's my full function:

exports.publicOutput = functions
  .region("us-central1")
  .runWith(runtimeOpts)
  .https.onRequest(async (req, res) => {
    const projectAlias = req.query.projectalias;
    const apiKey = req.query.apikey;
    let status = 404;
    let data = {
      Message:
        "Unauthorized access! Please provide the correct authentication data.",
    };
    let response = data;
    let scrapeStorage = [];

    // check if credentials are provided
    if (!projectAlias || !apiKey) {
      return res.status(status).send(response);
    }

    // when both items provided execute this
    if (projectAlias && apiKey) {
      const snapshot = await db
        .collection("AdaProjects")
        .where("projectAlias", "==", projectAlias)
        .where("hasAccess", "array-contains", apiKey)
        .limit(1)
        .get();

      if (snapshot.empty) {
        return res.status(status).send(response);
      }

      if (!snapshot.empty) {
        snapshot.forEach((doc) => {
          projectData = doc.data();
        });

        status = 200;
      }

      const crawlJobIDs = projectData.crawlJobs;

      let scrapeIDs = []; 

      crawlJobIDs.forEach(async (jobID) => {
        const snapshot = await db
          .collection("scrapes")
          .where("crawlJobID", "==", jobID)
          .get();

        if (snapshot.empty) {
          console.log("not found jobID", jobID);
          return;
        }

        snapshot.forEach((doc) => {
          scrapeIDs.push(doc.id);
          console.log(scrapeIDs); // here everything is fine. But this outputs (logically) after "DONE"
        });

      });
      
      response = scrapeIDs;
    }

    console.log("DONE");
    return res.status(status).send(response);
  });

I've also tried to put everythinig in a function and await that before the endpoint of the function.

 async function getAllScrapeIDs(crawlJobIDs) {
      let someData = [];
      try {
        crawlJobIDs.forEach(async (jobID) => {
          const snapshot = await db
            .collection("scrapes")
            .where("crawlJobID", "==", jobID)
            .get();

          if (snapshot.empty) {
            console.log("not found jobID", jobID);
            return;
          }

          snapshot.forEach((doc) => {
            someData.push(doc.id);
          });
        });
      } catch (error) {
        console.log(error);
        return null;
      }

      return someData;
    }

// and then later in the code 
const crawlJobIDs = projectData.crawlJobs;
response = await getAllScrapeIDs(crawlJobIDs);

Response is still empty as the async code is still not updated.

I have also tried to write everything without async/await and aplied the .then.catch options. Same output. My function finishes before it filled the array with data i want to output.

I find it mindbending as this part const crawlJobIDs = projectData.crawlJobs; is actually working. Maybe because it only is one item it searches?

Renaud Tarnec
  • 79,263
  • 10
  • 95
  • 121
Michel K
  • 641
  • 1
  • 6
  • 18

1 Answers1

0

It's not a good idea to use async/await within a forEach loop, see here and Using async/await with a forEach loop.

You should use Promise.all() as follows:

    const crawlJobIDs = projectData.crawlJobs;

    let scrapeIDs = [];

    let promises = [];

    crawlJobIDs.forEach((jobID) => {
        promises.push(db
            .collection("scrapes")
            .where("crawlJobID", "==", jobID)
            .get());
    });

    const snapshot = await Promise.all(promises);  

    // snapshot is an Array of QuerySnapshots
    // You need to loop on this Array

    snapshot.forEach(querySnapshot => {
        querySnapshot.forEach(documentSnapshot => {
            scrapeIDs.push(documentSnapshot.id);
        });
    });

I let you managing the cases where the snapshots are empty...


Also, you could use another approach for

  if (!snapshot.empty) {
    snapshot.forEach((doc) => {
      projectData = doc.data();
    });

    status = 200;
  }

Since you know that there is maximum one document in the querySnapshot (because of .limit(1)), do as follows:

  if (!snapshot.empty) {
    const doc = snapshot.docs[0];
    projectData = doc.data();
  
    status = 200;
  }
Renaud Tarnec
  • 79,263
  • 10
  • 95
  • 121
  • 1
    This is great @renaud. It immediately worked. The last note also is a nice one to shave off some lines of code.Thanks also for the references. I'll make the mental node of not using async within forEach loops. I wonder though, what is happening here ```crawlJobIDs.forEach((jobID) => { promises.push( db.collection("scrapes").where("crawlJobID", "==", jobID).get() ); });``` -- is it that icw ```const snapshotGroup = await Promise.all(promises);```.it waits for all of them to be done before moving on and using the snapshotGroup to loop over them? – Michel K Sep 11 '20 at 17:11
  • 1
    Yes, "the `Promise.all()` method takes an iterable of promises as an input, and returns a single Promise that resolves to an array of the results of the input promises." See https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Promise/all – Renaud Tarnec Sep 11 '20 at 17:16
  • 1
    Read through that one just a minute ago. I'll go read more about this and its use cases. Again, thanks a lot for the quick reply and saving my weekend before it starts :) – Michel K Sep 11 '20 at 17:20
  • 1
    @arnoud, did that already! Did that immediately actually. Also marked it as answer. Maybe it needs time? – Michel K Sep 16 '20 at 07:33