The main problem is an extra await page.waitForNavigation({ waitUntil: 'networkidle0' });
that will fail to resolve. page.goto
already waits for navigation, so you're asking Puppeteer to wait for a navigation that will never happen.
Only use page.waitForNavigation
if you're doing something to trigger a navigation, not as part of a typical page.goto
call. Remove this line and your code should work (more or less) as expected.
Furthermore, you're opening a whole new page (browser tab) per link. That's 360 tabs by my count, liable to run most computers out of memory. Better to navigate a single page repeatedly or close pages after you're finished doing whatever you plan to do on these pages. If that's too slow, try running chunks in parallel or using a task queue.
Also, the links are available in the static HTML, so you might not need Puppeteer here, again, depending on what you're planning on doing on each page. If you can get all of the data from each page statically, you could have a massive speedup, completing 360 scrapes with fetch
/cheerio
in a fraction of the time it'd take Puppeteer.
If you do stick with Puppeteer to bypass detection or deal with JS/interactivity, consider using domcontentloaded
rather than networkidle0
, which is usually unnecessarily strict and slow. The blog post linked explains the difference between the various loading conditions. See also my answer in the canonical thread Puppeteer wait until page is completely loaded for a deeper dive into page loading in Puppeteer.
a[href]
is a more precise selector than a
, because it's possible that some a
anchors have no href and should be discarded to avoid undefineds popping up.
Here's how I'd write this (with the aforementioned caveat that Puppeteer might not be needed at all):
const puppeteer = require("puppeteer"); // ^14.3.0
let browser;
(async () => {
browser = await puppeteer.launch({headless: false});
const [page] = await browser.pages();
await page.setViewport({width: 1200, height: 720});
const url = "https://s23.a2zinc.net/clients/acmedia/americancoatingsshow2022/Public/Exhibitors.aspx?Index=All#";
await page.goto(url, {waitUntil: "domcontentloaded"});
const hrefs = await page.$$eval("a[href]", els =>
els.map(a => a.href).filter(e => e.includes("eBooth.aspx"))
);
console.log(hrefs.length); // => 360
for (const url of hrefs) {
await page.goto(url);
// page is loaded; do your thing on this page
}
})()
.catch(err => console.error(err))
.finally(() => browser?.close())
;