0

I am trying to figure out how to wait for a promise to be resolved before starting the next iteration in a for loop. Someone had suggested for me to use the setInterval() function instead of a for loop, which is fine if you can guess the time that it will take for the promise to resolve, but it obviously is not ideal.

const puppeteer = require('puppeteer-extra')
const StealPlugin = require('puppeteer-extra-plugin-stealth')

puppeteer.use(StealPlugin())
let arrayOfUrls = [
    "https://google.com",
    "https://facebook.com",
    "https://youtube.com",
];

let initialIndex = 0;
let finalIndex = 0;

async function scraper(url) {
    const browser = await puppeteer.launch({headless: false});
    const page = await browser.newPage();
    await page.goto(url);
    await page.screenshot({path: 'example' + initialIndex.toString() + '.png'});
    await console.log(url + "  screenshot complete!")
    await browser.close();
}

const interval = setInterval(() => {
    if (initialIndex < arrayOfUrls.length) {
        scraper(arrayOfUrls[initialIndex]);
        initialIndex += 1;
    } else {
        clearInterval(interval);
        console.log("All complete!")
        loopy()
    }
}, 300)

function loopy() {
    setInterval(() => {
        if (finalIndex === arrayOfUrls.length) {
            finalIndex = 0;
        }
        scraper(arrayOfUrls[finalIndex]);
        finalIndex += 1;
    }, 300)
}

This above code is just experimental at the moment, but what I am ultimately trying to achieve is make a series of API requests using URLs from a text file and then create an array containing an object for each URL. This is the const interval = setInterval(() => { in my code.

Then I want to be able to periodically check each request again and check if there is a change in the API request and have this be performed indefinitely. This is the loopy() function in my experimental code. If there is I want to send a notification to myself.

My current implementation works fine if I set the time for the setInterval() to something high like 5000ms, but if it is something low like 300ms then the promises cannot be fullfilled quickly enough and I end up getting this error:

(node:9652) MaxListenersExceededWarning: Possible EventEmitter memory leak detected. 11 exit listeners added to [process]. Use emitter.setMaxListeners() to increase limit

What would be the best way to implement the logic for such a program?


Edit:

After the idea in the comments from WSC I attempted the following and it seems to work.

const puppeteer = require('puppeteer-extra')
const StealPlugin = require('puppeteer-extra-plugin-stealth')

puppeteer.use(StealPlugin())
let arrayOfUrls = [
    "https://google.com",
    "https://facebook.com",
    "https://youtube.com",
];

let initialIndex = 0;
let finalIndex = 0;

async function scraper(url) {
    const browser = await puppeteer.launch({headless: false});
    const page = await browser.newPage();
    await page.waitFor(5000)
    await page.goto(url);
    await page.screenshot({path: 'example' + initialIndex.toString() + '.png'});
    await console.log(url + "  screenshot complete!")
    await browser.close();
}

async function initialScrape() {
    if (initialIndex < arrayOfUrls.length) {
        await scraper(arrayOfUrls[initialIndex]);
        initialIndex += 1;
        initialScrape()
    } else {
        console.log("All complete!")
        loopy()
    }
}


async function loopy() {
    if (finalIndex === arrayOfUrls.length) {
        finalIndex = 0;
    }
    await scraper(arrayOfUrls[finalIndex]);
    finalIndex += 1;
    loopy()
}

initialScrape()

I have implemented the artificial delay into the scraper() function instead in the form of await page.waitFor(5000). However, I am not entirely sure if this particular implementation is recommended or not for the program I am trying to achieve.

knowledge_seeker
  • 811
  • 1
  • 8
  • 18
  • Don't you just need to do `await scraper(arrayOfUrls[finalIndex]);`? – WSC Aug 07 '20 at 09:41
  • [Resolve promises one after another (i.e. in sequence)?](https://stackoverflow.com/questions/24586110/resolve-promises-one-after-another-i-e-in-sequence) – Andreas Aug 07 '20 at 09:43
  • @WSC I tried to do that by declaring the anonymous function inside the `setInterval()` which is inside `loopy()` as async and then awating `scraper(arrayOfUrls[finalIndex]);` , but the problem is that it runs as soon as the program is executed and not when it is called in the `else` block of the first `setInterval()` statement. – knowledge_seeker Aug 07 '20 at 09:49

1 Answers1

1

The async/await syntax works fine with loops. You don't need to take a recursive approach.

async function main() {
    for (let initialIndex=0; initialIndex<arrayOfUrls.length; initialIndex++) {
        await scraper(arrayOfUrls[initialIndex]);
    }
    console.log("All complete!");
    while (true) {
        for (let finalIndex=0; finalIndex<arrayOfUrls.length; finalIndex++) {
            await scraper(arrayOfUrls[finalIndex]);
        }
    }
}
main().catch(console.error);

Or even easier with for … of loops:

async function main() {
    for (const url of arrayOfUrls) {
        await scraper(url);
    }
    console.log("All complete!");
    while (true) {
        for (const url of arrayOfUrls) {
            await scraper(url);
        }
    }
}
main().catch(console.error);

Btw, for performance I would recommend to call puppeteer.launch({headless: false}); only once and then do all screenshots with the same browser instance.

Bergi
  • 630,263
  • 148
  • 957
  • 1,375
  • When I previosuly used the ```while (true)``` I was advised against it because it is a blocking statement. If I wanted to implement a function ```sendNotification()``` (which is promise based) in the second ```for``` loop and I did not wish to await the ```sendNotification()``` would the ```while (true)``` bit block the ```sendNotification()``` from ever being executed due to its blocking nature? – knowledge_seeker Aug 07 '20 at 10:51
  • @user13834264 No, `while(true)` is not "blocking" if there is an `await` in the loop body. The `scraper` can run concurrently with your `sendNotification`. – Bergi Aug 07 '20 at 10:53
  • I took the line ```const browser = puppeteer.launch({headless: false})``` out of the ```scraper(url)``` body and appended a ```.then()``` to it replacing to replace the ```initialScrape()``` function call at the bottom. The line now reads ```const browser = puppeteer.launch({headless: false}).then(()=> {initialScrape()}); ```. However, I am unsure as to how I can get this line ```const page = await browser.newPage();``` to now work. I am now getting the following error ```(node:46908) UnhandledPromiseRejectionWarning: TypeError: browser.newPage is not a function``` – knowledge_seeker Aug 07 '20 at 12:13
  • No, you'll need to do `puppeteer.launch({headless: false}).then(browser => initialScrape(browser))` where the instance is passed as an argument into the function doing the looping. Or put the `const browser = await puppeteer.launch({headless: false});` in the `async main` function and pass it to `scraper(brower, url)` every time. – Bergi Aug 07 '20 at 12:20