Puppeteer element selection returning null or timing out

Question

I am trying to use puppeteer to extract the innerHTML value from a button on a webpage. For now, I am simply trying to await the appearance of the selector to allow me to then work with it.

On running the below code the program times out waiting.

const puppeteer = require("puppeteer");

const link =
  "https://etherscan.io/tx/0xb06c7d09611cb234bfcd8ccf5bcd7f54c062bee9ca5d262cc5d8f3c4c923bd32";

async function configureBrowser() {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();
  await page.goto(link);

  return page;
}

async function findFee(page) {
  await page.reload({ waitUntil: ["networkidle0", "domcontentloaded"] });
  await page.waitForSelector("#txfeebutton");
  console.log("boom");
}

const setup = async () => {
  const page = await configureBrowser();
  await findFee(page);
  await browser.close();
};

setup();

As you can see below, the element definitely exists in the HTML:

Console output:

Also try `await setup()`, and you're not closing the browser https://pptr.dev — evolutionxbox, Jul 05 '22 at 13:36
the call to setup() is outside of a function so I can't. If i wrap it in a function we are back to the same issue when calling that wrapper function — PeteG, Jul 05 '22 at 13:37
Change `await page.reload();` to `await page.reload({ waitUntil: ["networkidle0", "domcontentloaded"] });` — angel.bonev, Jul 05 '22 at 13:44
@PeteG thing that the problem is with that reload can we change it to `await page.evaluate(() => {location.reload(true)})` - this will reload via DOM — angel.bonev, Jul 05 '22 at 13:53
It also seems to make no difference if I remove the reload statement all together — PeteG, Jul 05 '22 at 14:02

ggorlen · Accepted Answer · 2023-08-15T23:57:36.393

It works fine with a user agent string:

const puppeteer = require("puppeteer"); // ^19.0.0

let browser;
(async () => {
  browser = await puppeteer.launch({headless: true});
  const [page] = await browser.pages();
  const ua = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36";
  await page.setExtraHTTPHeaders({"Accept-Language": "en-US,en;q=0.9"});
  await page.setUserAgent(ua);
  const url = "https://etherscan.io/tx/0xb06c7d09611cb234bfcd8ccf5bcd7f54c062bee9ca5d262cc5d8f3c4c923bd32";
  await page.goto(url);
  const btn = await page.waitForSelector("#txfeebutton");
  console.log(await btn.evaluate(el => el.textContent.trim())); // => ($0.56)
})()
  .catch(err => console.error(err))
  .finally(() => browser?.close())
;

One debugging strategy for this trying the same script with headless: false and seeing if that works, then checking page.content() when running headlessly. You can see Cloudflare is detecting your scraper and presenting a captcha.

If i may pick your brain a little more - I intend to use an array of transaction hashes (the final piece of the url) and loop over it to perform this scrape many times. Would you advise launching puppeteer each loop or doing it within the same launch and just using fresh .goto statements. The latter seems the obvious choice but dont want to run into any issues — PeteG, Jul 05 '22 at 15:05

Puppeteer element selection returning null or timing out

1 Answers1