How to handle multiple tabs in puppeteer-cluster[CONCURRENCY_BROWSER]?

Question

I'm attempting scraping for 3 urls with below conditions

Each url need to run in a separate browser.
The url may consist of 2 or more links to click
Open the links in new tab of the respective browsers (paralleled) and switch to it and scrape the content.

In other words, i am trying to open a url in a browser, fetch links in the page, open new tabs based on number of links fetched in the same browser, switch tabs click a button in them and get the confirmation message.

Also I need to run 3 urls parallel.

I have tried CONCURRENCY_BROWSER option to run urls in parallel but I am not able to open the link in a new tab. Any suggestions how I can manipulate tabs in puppeteer-cluster

what i need is :

async function test(){
    const cluster = await Cluster.launch({
        puppeteerOptions: {
            headless: false,
            defaultViewport: null, 
        },
      
        concurrency: Cluster.CONCURRENCY_BROWSER,
        maxConcurrency: 5,
        skipDuplicateUrls : true,
        timeout : 240000,
    });

    // initiate the cluster task for a set of urls from the cluster queue;
    
    await page.goto(url);
    
    // on visiting the page i retrieve 2 or more links and store it in a array
    
    let linksArray = [...subUrl];
    
    //load suburl in a new tab respectively of the same browser

    await cluster.newPage()

    //screenshot suburl
    
    await page.screenshot(suburl)
        
}

TypeError: cluster.newPage is not a function

in puppeteer i used to open a new tab using the command await browser.newPage()

score 1 · Answer 1 · answered Jul 31 '19 at 17:30

Author of puppeteer-cluster here. It is not easily possible to re-use the same browser. But, you can define one task with multiple page.goto calls inside like this:

const cluster = await Cluster.launch(/* ... */);

// define the task and reuse the window 
await cluster.task(async ({ page, data: url }) => {
    await page.goto(url);
    const secondUrl = /* ... */; // extract another URL somehow
    await page.goto(secondUrl);
    await page.screenshot(/* ... */);
});

// queue your initial links
cluster.queue('http://...');
cluster.queue('http://...');
// ...

score 0 · Answer 2 · edited Feb 26 '22 at 00:27

0

Here is an example of opening multiple tabs on the same browser instance

async function init(){

    var  browser = await puppeteer.launch({headless: false        ,  args: [ '--no-sandbox', '--disable-setuid-sandbox' , ]});
    open_tab('http://example1.com' , browser);
    open_tab('http://example2.com' , browser);
    open_tab('http://example3.com' , browser);

}


async function open_tab( url , browser ){


    let  page  = await browser.newPage();
    await page.setViewport({width: 1200, height: 1000});
    await page.goto( url );

}

edited Feb 26 '22 at 00:27

Neithan Max

11,004
5
40
58

answered Jul 28 '19 at 15:39

max

3,614
9
59
107

i have updated the question with the code snippet please check it out – Ajai Ganesh Jul 30 '19 at 09:36

score 0 · Answer 3 · answered Jul 14 '23 at 04:51

We can access the browser instance from page.browser(). And use that instance to create a new tab/page.


    await cluster.task(async ({ page, data }) => {
     
      const page2 = await page.browser().newPage();

      // ...Rest of te code
    });

How to handle multiple tabs in puppeteer-cluster[CONCURRENCY_BROWSER]?

3 Answers3