1

How to use puppeteer to access a site that is protected with CloudFlare? It's not that it blocks my connection or even shows a captcha. The principle of operation is as follows: I enter the site address, I get to the CloudFlare page, after a couple of seconds it redirects me to the site itself. When using puppeteer redirects do not occur. However, if you use "headless":false and carefully monitor the Chromium address bar, you can notice that for a split second it changes, but ultimately no redirect occurs.

const puppeteer = require('puppeteer');

(async () => {
  const browser = await puppeteer.launch({"headless":false});
  const page = await browser.newPage();
  await page.goto('https://www.orrick.com/');
  await page.waitForSelector(".homepage");
  await page.screenshot({path: 'example.png'});

  await browser.close();
})();

Video (speed 0.1): https://gfycat.com/circularfearlessgemsbuck

P. S. The application for which I am trying to use puppeteer is written in PHP. I am ready to consider any analogues of puppeteer that provide similar functionality (web browser emulation) and are actual as of 2022.

  • Does this answer your question? [Cloudflare bypass with Puppeteer](https://stackoverflow.com/questions/71923946/cloudflare-bypass-with-puppeteer) – filipe Aug 27 '22 at 06:23
  • @filipe No. 'When I run my program in "headless: false" it redirects to the page after 5 seconds.' - in my case, this redirect does not occur. – x2ADLcRoo33ZygUA Aug 27 '22 at 06:33
  • 1
    You could try [this](https://www.npmjs.com/package/puppeteer-extra-plugin-stealth), but ultimately you're fighting an uphill battle, and with good reason. Cloudflare's protections are there to defend against things like Puppeteer, and Cloudflare has far better resources to detect your kind of automation than those you have to defeat their protections. – filipe Aug 27 '22 at 06:38
  • Do you own the site you are trying to access? – AD7six Aug 27 '22 at 06:45
  • @filipe This doesn't work either. So you think CloudFlare is blocking my request? Again, it seems to me that puppeteer for some reason does not allow the redirect to be made. As you can see in the video, at some point the `__cf_chl_rt_tk` parameter appears in the address bar, but disappears almost immediately. It's like CloudFlare wants to redirect me, but puppeteer won't let me. – x2ADLcRoo33ZygUA Aug 27 '22 at 07:07
  • @AD7six No. `4 more to go...` – x2ADLcRoo33ZygUA Aug 27 '22 at 07:09
  • I'd say this is most likely the intended behavior, as per [this GitHub issue](https://github.com/puppeteer/puppeteer/issues/7006), not a problem with the library or the website. – filipe Aug 27 '22 at 07:10
  • @filipe Interesting. Instead of `https://www.orrick.com/` I tried another site that is protected by CloudFlare. It was successful, the CloudFlare page redirected! But `https://www.orrick.com/` is still a problem. – x2ADLcRoo33ZygUA Aug 27 '22 at 07:19
  • 1
    Cloudflare has different levels of protection. For example, my company uses Medium, but perhaps `orrick.com` is using High or even “I'm Under Attack!”. Or it could be that Cloudflare has increased the protection on `orrick.com` against you specifically since it's possibly tracking your IP and noticed you're repeatedly using automated tools against that particular website. – filipe Aug 27 '22 at 07:25
  • 1
    Ok, working as designed then :). If there’s a legitimate use case please contact the site owner. – AD7six Aug 27 '22 at 08:23
  • Did you find a solution for this? The same is happening to me – Alberto Espinoza Oct 14 '22 at 13:05
  • @AlbertoEspinoza Partly. After studying a lot of information on this topic, I came to the conclusion that trying to bypass CloudFlare on your own is not a very viable idea. So I used a web scrapping service (I settled on ScrapFly). – x2ADLcRoo33ZygUA Nov 24 '22 at 22:07
  • I have found that if your instance runs in Linux it could affect to bypass the cloudfare security even if you are using different proxies. There are better results in Windows. – Alberto Espinoza Dec 21 '22 at 17:16

0 Answers0