3

I am trying to use puppeteer to extract the innerHTML value from a button on a webpage. For now, I am simply trying to await the appearance of the selector to allow me to then work with it.

On running the below code the program times out waiting.

const puppeteer = require("puppeteer");

const link =
  "https://etherscan.io/tx/0xb06c7d09611cb234bfcd8ccf5bcd7f54c062bee9ca5d262cc5d8f3c4c923bd32";

async function configureBrowser() {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();
  await page.goto(link);

  return page;
}

async function findFee(page) {
  await page.reload({ waitUntil: ["networkidle0", "domcontentloaded"] });
  await page.waitForSelector("#txfeebutton");
  console.log("boom");
}

const setup = async () => {
  const page = await configureBrowser();
  await findFee(page);
  await browser.close();
};

setup();

As you can see below, the element definitely exists in the HTML:

HTML evidence

Console output:

enter image description here

PeteG
  • 421
  • 3
  • 17

1 Answers1

3

It works fine with a user agent string:

const puppeteer = require("puppeteer"); // ^19.0.0

let browser;
(async () => {
  browser = await puppeteer.launch({headless: true});
  const [page] = await browser.pages();
  const ua = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36";
  await page.setExtraHTTPHeaders({"Accept-Language": "en-US,en;q=0.9"});
  await page.setUserAgent(ua);
  const url = "https://etherscan.io/tx/0xb06c7d09611cb234bfcd8ccf5bcd7f54c062bee9ca5d262cc5d8f3c4c923bd32";
  await page.goto(url);
  const btn = await page.waitForSelector("#txfeebutton");
  console.log(await btn.evaluate(el => el.textContent.trim())); // => ($0.56)
})()
  .catch(err => console.error(err))
  .finally(() => browser?.close())
;

One debugging strategy for this trying the same script with headless: false and seeing if that works, then checking page.content() when running headlessly. You can see Cloudflare is detecting your scraper and presenting a captcha.

Related:

ggorlen
  • 44,755
  • 7
  • 76
  • 106
  • If i may pick your brain a little more - I intend to use an array of transaction hashes (the final piece of the url) and loop over it to perform this scrape many times. Would you advise launching puppeteer each loop or doing it within the same launch and just using fresh .goto statements. The latter seems the obvious choice but dont want to run into any issues – PeteG Jul 05 '22 at 15:05
  • 1
    I'd use `goto` statements. – ggorlen Jul 05 '22 at 15:06