2

I am using puppeteer to do some testing.

No code written because I don't even know how to approach this.

• I have a list of 10 IDs inside an array

• For each ID -  a new page/tab is opened

• I want to run the script for each page/ tab without having to wait for the previous page/tab 
to finish before starting the next. Hence the simultaneous execution.

So 10 pages will be running the same script at the same time?

Is this possible with Javascript and puppeteer?

Mike
  • 253
  • 1
  • 6
  • 17

2 Answers2

3

You might want to check out puppeteer-cluster (I'm the author of that library), which supports your use case. The library runs tasks in parallel, but also takes care of error handling, retrying and some other things.

You should also keep in mind that opening 10 pages for 10 URLs is quite costly in terms of CPU and memory. You can use puppeteer-cluster to use a pool of browsers or pages instead.

Code Sample

You can see a minimal example below. It's also possible to use the library in more complex settings.

const { Cluster } = require('puppeteer-cluster');

(async () => {
  const cluster = await Cluster.launch({
    concurrency: Cluster.CONCURRENCY_PAGE, // use one browser per worker
    maxConcurrency: 4, // Open up to four pages in parallel
  });

  // Define a task to be executed for your data, this function will be run for each URL
  await cluster.task(async ({ page, data: url }) => {
    await page.goto(url);
    // ...
  });

  // Queue URLs (you can of course read them from an array instead)
  cluster.queue('http://www.google.com/');
  cluster.queue('http://www.wikipedia.org/');
  // ...

  // Wait for cluster to idle and close it
  await cluster.idle();
  await cluster.close();
})();
Thomas Dondorf
  • 23,416
  • 6
  • 84
  • 105
  • Thanks for the library - i'll test it out tonight. Would headless in my case help with the CPU? @ThomasDondorf – Mike Jun 02 '20 at 21:15
  • @Mike No, that does not make a difference, but there are multiple [tools](https://stackoverflow.com/a/57295869/5627599) to fine out what suits your machine. – Thomas Dondorf Jun 03 '20 at 05:13
2

Yes, it's default asynchronous behavior. You just need to open 10 tabs and run your script over these pages.

Here is the sample:

(async () => {
    const browser = await puppeteer.launch({
        headless: false
    });
    const ids = ['1', '2', '3'];
    const pool = [];

    for (let index = 0; index < ids.length; index++) {
        pool.push(
            browser.newPage() // create new page for each id
                .then(page => {
                    const currentId = ids[index];
                    // your script over current page
                })
        );
    }

    await Promise.all(pool); // wait until all 10 pages finished
    await browser.close(); // close the browser
})();
vadimk7
  • 6,559
  • 1
  • 12
  • 15
  • promise.all() will wait until all pages are resolved? Is it possible to have the pages tha resolve before others return their values? without having to wait until all are complete? – Mike Jun 03 '20 at 00:51
  • Nevermind i removed the close() and it worked as intended. close() it would close after just one iteration. – Mike Jun 03 '20 at 01:40