3

I'm trying to use Puppeteer to load a page that's protected by Cloudflare. When it loads as headless, it 403s and presents a captcha. However, when it loads as not headless, it receives a 200 and loads the page correctly.

I'm trying to understand what is happening during the not-headless load that allows the page to load correctly.

I thought it might be some kind of local Javascript execution on the page, but I've disabled Javascript in all cases. I've also ruled out IP and rate limiting — all tests were conducted on the same personal machine, at least 1 minute apart.

Here's the code:

(async () => {
  let headless = true;
  const browser = await puppeteer.launch({headless});
  const page = await browser.newPage();
  await page.setJavaScriptEnabled(false);

  await page.goto('https://angel.co/company/angellist/jobs');
  await page.screenshot({path: 'out.png'});

  await browser.close();
})();

And the results:

for headless = false not headless

for headless = true headless

Patrick Perini
  • 22,555
  • 12
  • 59
  • 88

0 Answers0