I'm seeing varying behavior here depending on whether I run headlessly or not. If I run headfully, clicking each link seems to pop open a new page, which I can capture, then pull the job's description:
const puppeteer = require("puppeteer"); // ^19.7.5
const url = "<Your URL>";
let browser;
(async () => {
browser = await puppeteer.launch({headless: false});
const [page] = await browser.pages();
await page.goto(url, {waitUntil: "domcontentloaded"});
await page.waitForSelector("a.jcs-JobTitle");
const descriptions = [];
for (const job of await page.$$("a.jcs-JobTitle")) {
await job.click();
const newTarget = await browser.waitForTarget(target =>
target.opener() === page.target()
);
const newPage = await newTarget.page();
const el = await newPage.waitForSelector("#jobDescriptionText");
descriptions.push(await el.evaluate(e => e.textContent.trim()));
await newPage.close();
}
console.log(descriptions);
})()
.catch(err => console.error(err))
.finally(() => browser?.close());
But when I run headlessly the selector doesn't seem to show up.
When I run Firefox manually, I see that the description appears in a sidebar that doesn't involve a navigation, so there's clearly some variable behavior at hand.
One workaround that should handle both cases is to grab the URL from the link for each description, then use page.goto()
to navigate to it, hopefully bypassing the click behavior differences:
// ...
browser = await puppeteer.launch();
const [page] = await browser.pages();
const ua = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.181 Safari/537.36";
await page.setUserAgent(ua);
await page.goto(url, {waitUntil: "domcontentloaded"});
await page.waitForSelector("a.jcs-JobTitle");
const descriptions = [];
const hrefs = await page.$$eval(
"a.jcs-JobTitle",
els => els.map(e => e.href)
);
for (const href of hrefs) {
await page.goto(href, {waitUntil: "domcontentloaded"});
const el = await page.waitForSelector("#jobDescriptionText");
descriptions.push(await el.evaluate(e => e.textContent.trim()));
}
console.log(descriptions);
// ...
Note that I'm using a user agent header to avoid detection in headless mode.