2

I'm attempting scraping for 3 urls with below conditions

  1. Each url need to run in a separate browser.

  2. The url may consist of 2 or more links to click

  3. Open the links in new tab of the respective browsers (paralleled) and switch to it and scrape the content.

In other words, i am trying to open a url in a browser, fetch links in the page, open new tabs based on number of links fetched in the same browser, switch tabs click a button in them and get the confirmation message.

Also I need to run 3 urls parallel.

I have tried CONCURRENCY_BROWSER option to run urls in parallel but I am not able to open the link in a new tab. Any suggestions how I can manipulate tabs in puppeteer-cluster

what i need is :

async function test(){
    const cluster = await Cluster.launch({
        puppeteerOptions: {
            headless: false,
            defaultViewport: null, 
        },
      
        concurrency: Cluster.CONCURRENCY_BROWSER,
        maxConcurrency: 5,
        skipDuplicateUrls : true,
        timeout : 240000,
    });

    // initiate the cluster task for a set of urls from the cluster queue;
    
    await page.goto(url);
    
    // on visiting the page i retrieve 2 or more links and store it in a array
    
    let linksArray = [...subUrl];
    
    //load suburl in a new tab respectively of the same browser

    await cluster.newPage()

    //screenshot suburl
    
    await page.screenshot(suburl)
        
}

TypeError: cluster.newPage is not a function

in puppeteer i used to open a new tab using the command await browser.newPage()

Ajai Ganesh
  • 21
  • 1
  • 3

3 Answers3

1

Author of puppeteer-cluster here. It is not easily possible to re-use the same browser. But, you can define one task with multiple page.goto calls inside like this:

const cluster = await Cluster.launch(/* ... */);

// define the task and reuse the window 
await cluster.task(async ({ page, data: url }) => {
    await page.goto(url);
    const secondUrl = /* ... */; // extract another URL somehow
    await page.goto(secondUrl);
    await page.screenshot(/* ... */);
});

// queue your initial links
cluster.queue('http://...');
cluster.queue('http://...');
// ...
Thomas Dondorf
  • 23,416
  • 6
  • 84
  • 105
0

Here is an example of opening multiple tabs on the same browser instance

async function init(){

    var  browser = await puppeteer.launch({headless: false        ,  args: [ '--no-sandbox', '--disable-setuid-sandbox' , ]});
    open_tab('http://example1.com' , browser);
    open_tab('http://example2.com' , browser);
    open_tab('http://example3.com' , browser);

}


async function open_tab( url , browser ){


    let  page  = await browser.newPage();
    await page.setViewport({width: 1200, height: 1000});
    await page.goto( url );

}
Neithan Max
  • 11,004
  • 5
  • 40
  • 58
max
  • 3,614
  • 9
  • 59
  • 107
0

We can access the browser instance from page.browser(). And use that instance to create a new tab/page.


    await cluster.task(async ({ page, data }) => {
     
      const page2 = await page.browser().newPage();

      // ...Rest of te code
    });

Min Somai
  • 567
  • 6
  • 13