6

(In nodeJs -> server side only).

I'm doing some webscraping and some pages are protected by the cloudflare anti-ddos page. I'm trying to bypasse this page. By searching around I found a lot of article on the stealth methode or reCapcha. But the thing is cloudflare is not even trying to give me capcha, it keep being stuck on the page (wait for 5 secondes) because it display in red (TURN ON JAVASCRIPT AND RELOAD) and (TURN ON COOKIES AND RELOAD), by the way my javascript seems to be active because my programme run on a lot of website and it process the javascript.

This is my code:

//vm = this;
vm.puppeteer.use(vm.StealthPlugin())
vm.puppeteer.use(vm.AdblockerPlugin({
  blockTrackers: true
}))
let browser = await vm.puppeteer.launch({
  headless: true
});
let browserPage = await browser.newPage();
await browserPage.goto(link, {
  waitUntil: 'networkidle2',
  timeout: 40 * 1000
});
await browserPage.waitForTimeout(20 * 1000);
let body = await browserPage.evaluate(() => {
  return document.documentElement.outerHTML;
});

I also try to delete stealthPlugin and AdblockerPlugin but cloodflare keeping telling me there is no javascript and cookies.

Can anyone help me please ?

Thibaud
  • 1,059
  • 4
  • 14
  • 27
  • What happens if you run this headful? Any js-related errors in console? – Vaviloff Oct 12 '20 at 08:57
  • By commenting headless: true I get exactly the same thing By editing headless; false I get an error UnhandledPromiseRejectionWarning: Error: Failed to launch the browser process! TROUBLESHOOTING but a tab open in my google chrome – Thibaud Oct 12 '20 at 09:12
  • I clean my work and now with headless false a tab open in the nav and no red thing but when cloudflare reload after the 5 secondes it reload on itself not on the website that I want – Thibaud Oct 12 '20 at 09:28
  • Have you got any news by now ? I'm facing the same issue, except I never had the "Turn on Javascript" message – Maskim Feb 18 '21 at 15:33
  • I've try to modify the headers by that doesn't seem to work. By talking with some people it seems to be normal because cloudflare try to catch bot and so even if you find a solution it have some risk to be temporary (cat / mouse game) – Thibaud Feb 18 '21 at 18:26
  • Same issue here. Request from Postman always results in the "enable cookies" message, even when copying all the request headers – pete Jul 15 '23 at 02:35

1 Answers1

2

Setting your own UserAgent and Accept-Language header should work because your headless browser needs to pretend like a real person who is browsing.

You can use page.setExtraHTTPHeaders() and page.setUserAgent() to do so.

await browserPage.setExtraHTTPHeaders({
 'Accept-Language': 'en'
});
// You can use any UserAgent you want
await browserPage.setUserAgent('Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36');
Matt
  • 3
  • 2
ErcouldnT
  • 39
  • 6
  • Please try to add a code example or a more detailed step-by-step, so the fellow can replicate your suggestion more clearly. – Evan P Aug 08 '21 at 15:31
  • 1
    well that's doesn't work, it keep getting stuck on the cloudflare page – Thibaud Feb 22 '22 at 13:11