0

I have a website to be scraped, daily around 10k-20k visits to the page. I did it for more than 1 month, and everything is fine.

  1. I can access it using the non-selenium browser.
  2. For now, using the Selenium browser, there is Cloudflare which blocks me from visiting. After 3 to 5 click on Cloudflare block, it still distinguishes me as a bot.
  3. However, under the same Selenium browser, I open this website using a new tab, and it works. However, I still need to click on Cloudflare once to visit the website.

Tried:

  1. Pass in user agent
  2. options.add_experimental_option('useAutomationExtension', False)
  3. options.add_argument('--disable-blink-features=AutomationControlled')
  4. Selenium headless: How to bypass Cloudflare detection using Selenium
  5. selenium_stealth (don't know whether in a correct way)
def getDriver():
    options = webdriver.ChromeOptions()
    # chrome_options.add_argument("--headless")

    broswer = uc.Chrome(options=options, version_main=113)

    return broswer

How can I modify it to make it work using headless mode?

Current setting: undetected chrome, version = 113

1 Answers1

0

Use undetected-chromedriver instead . It is a Python library designed to enable the use of Chrome WebDriver in a stealthier way, particularly for web scraping and automation tasks. It serves as a patch for Chromedriver, ensuring that it does not trigger anti-bot services such as Distill Network, Imperva, DataDome, or Botprotect.io. The library takes care of automatically downloading the driver binary and applying the necessary patches for improved undetectability.

Installation

pip install undetected-chromedriver

Usage

import undetected_chromedriver
driver = undetected_chromedriver.Chrome()
Abhay Chaudhary
  • 1,763
  • 1
  • 8
  • 13