0

I am trying to scrape the list of results from Google Maps. Example: Visit https://www.google.com/maps/search/gym+in+nyc Get all results in an array, loop Starts and click element#1, extract data, back to results page and continue loop.

const finalData = async function () {
    const arr = [];
    const resultList = [...[...document.querySelectorAll("[aria- 
    label^='Results for']")][0].children].filter((even, i) => !(i % 2));
    for (const eachElement of resultList) {
        let response = await scrapePage(eachElement);
        arr.push(response);
    }
    return arr;
 };

async function scrapePage(elem) {
    // Clicks each element
    let click = await elem.click();
    // Grabs Just the Title
    const titleText = await setTimeout(function () {
        let title = document.querySelector(".section-hero-header-title 
span").innerText;
        return title;
      }, 3000);
// setTimeout to cause delay before click the back button
setTimeout(function () {
document.querySelector(".section-back-to-list-button").click();
      }, 5000);
return titleText;
     }
    const final = finalData().then((value) => {
    return value;
     });

I have no idea why when I try the above code in devtools, only the last result is clicked and why my const variable "final" is filled with array of random numbers.

Sarabadu
  • 587
  • 2
  • 18
  • you cannot await a `setTimeout`, you can wrap it with a promise like in this answer https://stackoverflow.com/a/33292942/1410618 – Sarabadu Dec 06 '20 at 20:27
  • 1
    Does this answer your question? [Combination of async function + await + setTimeout](https://stackoverflow.com/questions/33289726/combination-of-async-function-await-settimeout) –  Dec 06 '20 at 20:56

2 Answers2

0

You can try this:

async function scrapePage(elem) {
    // Clicks each element
    let click = await elem.click();
    // Grabs Just the Title
    const titleText = await  new Promise((resolve) => {
          setTimeout(function () {
                let title = document.querySelector(".section-hero-header-title span").innerText;
                resolve(title);
      }, 3000);
    })


    // setTimeout to cause delay before click the back button
    setTimeout(function () {
        document.querySelector(".section-back-to-list-button").click();
     }, 5000);
    
     return titleText;
}

Sarabadu
  • 587
  • 2
  • 18
  • I tried your solution, but I get the "Cannot read property 'innerText' of null" error. This happened right after it clicked the first element in the list. – user7248043 Dec 07 '20 at 20:30
  • This solution is assuming the same selectors as your snippets, please check if `document.querySelector(".section-hero-header-title-span”)` brings any element. If you can create a code box or a code box to test will be great to see the html code – Sarabadu Dec 07 '20 at 23:38
  • yes, the piece of code does display the inner Text: ScreenShot:: https://i.imgur.com/Ow4FW3K.png . CodeBox is not possible, because i wrote that code on the fly in DevTools and the only way to test it is by visiting https://www.google.com/maps/search/gym+in+nyc and pasting the code in Console.log window. – user7248043 Dec 08 '20 at 19:04
  • Oh so it’s a typo `section-hero-header-title-span` Instead of `section-hero-header-title span` Fixed on the answer – Sarabadu Dec 08 '20 at 20:43
0

The problem is that you had assumed that the await operator worked on setTimeout; setTimout, setInterval and related functions do not use promises at all.

When working with time orientated code, generally I would set up a helper function that uses a promise:

const delay = seconds => new Promise(
    resolve => {
        setTimeout(resolve, seconds);
    }
);

Updating your code to wait for the timeouts from here is simple:

const finalData = async () => {
    const arr = [];
    const resultList = [...document.querySelector("[aria-label^='Results for']").children].filter((even, i) => !(i % 2));

    for (const eachElement of resultList) {
        const response = await scrapePage(eachElement);
        arr.push(response);
    }

    return arr;
};

async function scrapePage(elem) {
    // Clicks each element
    const click = await elem.click(); // this is not asynchronous! click also returns undefined
    // Grabs Just the Title
    await delay(3);
    const titleText = document.querySelector(".section-hero-header-title span").innerText;
    // delay before click[ing] the back button
    await delay(5);

    document.querySelector(".section-back-to-list-button").click();

    return titleText;
}

const final = finalData();

And, on a related note, this piece of code waits for every single result in sequence, this takes at least 8 seconds per iteration:

const finalData = async () => {
    const arr = [];
    const resultList = [...document.querySelector("[aria-label^='Results for']").children].filter((even, i) => !(i % 2));

    for (const eachElement of resultList) {
        let response = await scrapePage(eachElement);
        arr.push(response);
    }
    return arr;
};

If you wanted to concurrently execute every iteration, you may want to consider mapping the function across the array and using Promise.all, like so:

const finalData = async () => {
    const resultList = [...document.querySelector("[aria-label^='Results for']").children].filter((_, i) => !(i % 2));

    return Promise.all(resultList.map(scrapePage));
};
  • I tried your solution, but I think the major problem is with closures. when I perform the elem.click() in the scrapePage function, it always clicks the last element in the array. I am using var and also enclosed the function in iife, but only the last element is clicked. – user7248043 Dec 07 '20 at 20:22
  • @user7248043 You should not be using `var`, it causes far too many problems. Might I ask, why are you using it in your case? –  Dec 07 '20 at 20:47
  • @user7248043 Do you have an example webpage that I could test the code on? –  Dec 07 '20 at 20:48
  • You can access the below maps page and add the code in console.log. https://www.google.com/maps/search/gym+in+nyc The idea is to iterate through all search results in Google Maps. When I use Var, the closure problem is addressed, but the delay doesn't wait for 3 seconds. – user7248043 Dec 08 '20 at 18:19
  • @user7248043 But this isn't all of your code, is it? I need the whole picture. –  Dec 08 '20 at 18:28
  • That is all my code. If I can loop through all the array elements in Devtools window and Successfully bring back at least the Title Text, then I would build a Chrome extension based on this. The code is pretty straight forward, it grabs all the divs in the results and uses a loop to click each element from the array bring some data back and continue iterating. – user7248043 Dec 08 '20 at 18:47