5

I would like to use chromedriver to scrape some stories from fanfiction.net. I try the following:

from selenium import webdriver
import time

path = 'D:\chromedriver\chromedriver.exe'

browser = webdriver.Chrome(path)
url1 = 'https://www.fanfiction.net/s/8832472'
url2 = 'https://www.fanfiction.net/s/5218118'

browser.get(url1)
time.sleep(5)
browser.get(url2)

The first link opens (sometimes I have to wait 5 seconds). When I want to load the second url, cloudflare intervens and wants me to solve captchas - which are not solvable, atleast cloudflare does not recognize this. This happens also, if I enter the links manually in chromedriver (so in the GUI). However, if I do the same things in normal chrome, everything works just as fine (I do not even get the waiting period on the first link) - even in private mode and all cookies deleted. I could reproduce this on several machines. Now my question: To my intuition, chromedriver was just the normal chrome browser which allowed to be controlled. What is the difference to normal chrome, how does Cloudflare distinguish both, and how can I mask my chromedriver as normal chrome? (I do not intend to load many pages in very short time, so it should not look like a bot). I hope my question is clear

undetected Selenium
  • 183,867
  • 41
  • 278
  • 352
Tamar
  • 65
  • 1
  • 5
  • this is bot detection on the site... (I don't think cloudflare in particular has anything to do with it, but a feature that sites can use) The difference is that in one case injections are made into the DOM. That session can be known as "bot-controlled"... the captcha will never solve in that session. If you start your browser manually the site does not detect you as a bot (cause you aren't!) and you can solve the captcha if it appears. (chromedriver launches and then interacts with the browser... it's a separate executable and there is a localhost communication between the driver and browser) – pcalkins Jan 08 '21 at 21:22

1 Answers1

2

This error message...

Checking your browser before accessing

...implies that the Cloudflare have detected your requests to the website as an automated bot and subsequently denying you the access to the application.


Solution

In these cases the a potential solution would be to use the undetected-chromedriver to initialize the Chrome Browsing Context.

undetected-chromedriver is an optimized Selenium Chromedriver patch which does not trigger anti-bot services like Distill Network / Imperva / DataDome / Botprotect.io. It automatically downloads the driver binary and patches it.

  • Code Block:

    import undetected_chromedriver as uc
    from selenium import webdriver
    import time
    
    options = webdriver.ChromeOptions() 
    options.add_argument("start-maximized")
    driver = uc.Chrome(options=options)
    url1 = 'https://www.fanfiction.net/s/8832472'
    url2 = 'https://www.fanfiction.net/s/5218118'
    driver.get(url1)
    time.sleep(5)
    driver.get(url2)
    

References

You can find a couple of relevant detailed discussions in:

undetected Selenium
  • 183,867
  • 41
  • 278
  • 352
  • 1
    Thanks, that did the trick! Just out of curiosity, how exactly do they (cloudflare) recognize that I use bot-software? – Tamar Jan 09 '21 at 10:24
  • @Tamar https://stackoverflow.com/questions/33225947/can-a-website-detect-when-you-are-using-selenium-with-chromedriver/41220267 – iMath Nov 17 '21 at 10:59
  • Today undetected_chromedriver in my case is detected by cloudflare, is there some update for this ? thanks – Pawel W Jan 12 '23 at 19:02
  • @PawelW There are so many other approaches to avoid detection. – undetected Selenium Jan 13 '23 at 00:02