I'm trying to use Puppeteer to load a page that's protected by Cloudflare. When it loads as headless
, it 403s and presents a captcha. However, when it loads as not headless
, it receives a 200 and loads the page correctly.
I'm trying to understand what is happening during the not-headless load that allows the page to load correctly.
I thought it might be some kind of local Javascript execution on the page, but I've disabled Javascript in all cases. I've also ruled out IP and rate limiting — all tests were conducted on the same personal machine, at least 1 minute apart.
Here's the code:
(async () => {
let headless = true;
const browser = await puppeteer.launch({headless});
const page = await browser.newPage();
await page.setJavaScriptEnabled(false);
await page.goto('https://angel.co/company/angellist/jobs');
await page.screenshot({path: 'out.png'});
await browser.close();
})();
And the results: