0

I have a list of href in a first page :

const hrefsToVisit = await pageHome.$$('.seemore');
for( let hrefToVisit of hrefsToVisit ) {
    var linkToPage = await pageHome.evaluate(el => el.getAttribute("href"), hrefToVisit);
    console.log('link to visit : ', linkToPage);        
}

I want to open all this href, and refresh them continuously, and simultaneously. The ideal would probably be to open them each one in a tab in one dedicated "browser", and ask for refresh all tabs each 10 seconds for example, and do a scrap treatment for all pages after that.

But the problem is, I don't know how much tabs I will have, so I can't "name" the pages.

Like :

const hrefsToVisit = await pageHome.$$('.seemore');
for( let hrefToVisit of hrefsToVisit ) {
    var linkToPage = await pageHome.evaluate(el => el.getAttribute("href"), hrefToVisit);
    console.log('link to visit : ', linkToPage);  
    var tab1 = await browser2.newPage();   
    await tab1.goto(linkToPage, {waitUntil: 'domcontentloaded'});  
    
}

After that, in my "browser2", I will only have one tab, the last page analysed. I want to open a new page for each link, without create a "tabX" named variable, and then ask to refresh all tabs, and scrap all tabs content.

user2178964
  • 124
  • 6
  • 16
  • 40
  • 1
    How about an array? `tabs = []` ... `tabs.push(await browser.newPage())`, then use `tabs[0]` to get the first tab, `tabs[1]` to get the second, etc. – ggorlen Jul 07 '22 at 00:03
  • 1
    Does this answer your question? ["Variable" variables in JavaScript](https://stackoverflow.com/questions/5187530/variable-variables-in-javascript) – ggorlen Jul 07 '22 at 00:04
  • Thx, it works great with a tab. Is there a way to refresh all tabs without doing a loop and call refresh on each tab ? The goal is to refresh the tabs simultaneously to speed the process :) Thx – user2178964 Jul 07 '22 at 12:53
  • 1
    You can spawn a bunch of promises (almost all Puppeteer methods are async) and use `Promise.all` to handle them in parallel. [This answer](https://stackoverflow.com/questions/46293216/crawling-multiple-urls-in-a-loop-using-puppeteer/65000065#65000065) might be useful. – ggorlen Jul 07 '22 at 16:06

1 Answers1

0

Thanks ggorlen !

//In my first tab I have my home page with all links to scrap, I open one tab by link
for( let hrefToVisit of hrefsToVisit ) {
   var linkToDeal = await tabs[0].evaluate(el => el.getAttribute("href"), hrefToVisit);     
   tabs.push(await browser.newPage());
}
    
//Then I refresh all tabs in the same time
var promises = [];
for (let tab of tabs ) {
   promises.push(tab.reload({ waitUntil: ["domcontentloaded"] }));
}
    
await Promise.all(promises);
user2178964
  • 124
  • 6
  • 16
  • 40