20

This is a recent problem, it began I think three or four days ago. It is not isolated to my own system, as I was running the software on a remote server as well (Windows 10, Windows Server). It is not also not isolated to any specific URL, as I can't get past any URL that has this check now.

Title: "Just a moment..." "Checking your browser before accessing URL". "This process is automatic. Your browser will redirect to your requested content shortly." "Please allow up to 5 seconds..." "DDos Protection by Cloudflare" "Ray Id: xxxxxxxxxxxxxxxxxx"

  • I've attempted different systems (both windows based)
  • I've attempted different drivers (gecko and chrome)
  • I've attempted different urls
from selenium import webdriver
driver = webdriver.Chrome()
driver.get('wwww.etherdelta.com')

Does anyone know how I can resolve this; or is it time to put poor ol' timmy (the program) down?

gloomyfit
  • 475
  • 1
  • 5
  • 13

4 Answers4

19

I had the same issue with firefox. I was able to solve it by switching to Chrome.
Example code:

from selenium import webdriver
url = "<WEBSITE>"
options = webdriver.ChromeOptions()
options.add_argument("--disable-blink-features=AutomationControlled")
driver = webdriver.Chrome(options=options)
driver.get(url)

"--disable-blink-features=AutomationControlled" hides the "navigator.webdriver" flag.
See Selenium webdriver: Modifying navigator.webdriver flag to prevent selenium detection

Edit

You also have to change some default variables of chromedriver.
Example with perl:

perl -pi -e 's/cdc_/dog_/g' /path/to/chromedriver

For more details look at the original post.
See Can a website detect when you are using selenium with chromedriver?

Edit 2

Cloudflare keeps adjusting their algorithm so you could try to use undetected-chromedriver instead of the manual changing of the chromedriver.

undetected-chromedriver is an optimized Selenium Chromedriver patch which should not trigger anti-bot services. It automatically downloads the driver binary and patches it.

Wether this will work or not kinda depends on the website and the current state of the development. Cloudflare seems to track the development of undetected-chromedriver.

import undetected_chromedriver as uc
url = "<WEBSITE>"
driver= uc.Chrome()
driver.get(url)
MrTiny
  • 191
  • 4
3

Try using yout chrome data folder

from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
from webdriver_manager.utils import ChromeType

# Configure browser
options = webdriver.ChromeOptions()
options.add_argument(f"--user-data-dir=C:\\Users\\daria\\AppData\\Local\\Google\\Chrome\\User Data")
options.add_argument("--disable-blink-features=AutomationControlled")

chromedriver = ChromeDriverManager(chrome_type=ChromeType.GOOGLE, 
                                            log_level='0', 
                                            print_first_line=False).install()
driver = webdriver.Chrome(chromedriver, 
                                options=options,
                                service_log_path=None)

input ("End?")
DARI HDEZ
  • 31
  • 2
  • Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Dec 09 '21 at 23:41
  • This is crazy... this actually works! But why...???? – Mecanik Jan 13 '22 at 08:02
  • It seems because it will run you default Chrome installation, but still unclear why. – Mecanik Jan 13 '22 at 08:14
  • It seems to work on some sites, but doesn't work on others. – gloomyfit Jan 25 '22 at 11:14
  • @Mecanik reading your chrome data, like cookies and history, the page detect that you are a bot (you are uring selenium). Then, using the default chrome folder, selenium will open your chrome window with all you data (user account, history, cookies, etc) and in this way the page think that you are a real user. – DARI HDEZ Jan 26 '22 at 09:20
  • @gloomyfit There can be many reasons why a website is blocking you: IP blocking, cookies, behavior within the website, excessive number of requests in a short time, etc. Every website needs its own web scraping code. – DARI HDEZ Jan 26 '22 at 09:22
  • This works amazing well. Saved me a lot of time trying to troubleshoot. – james-see Feb 07 '22 at 19:30
3

I had the same problem when using headless Selenium on a Docker Linux image.

I solved it by creating a virtualdisplay right before calling the webdriver:

from pyvirtualdisplay import Display
display = Display(visible=0, size=(800, 800))  
display.start()

Don't forget to install both pyvirtualdisplay and xvfb: pip install pyvirtualdisplay and sudo apt-get install xvfb

And you must remove the "headless" option in ChromeDriver, here is the complete code I use :

    #Display in order to avoid CloudFare bot detection
    display = Display(visible=0, size=(800, 800))  
    display.start()
  
    options = webdriver.ChromeOptions()
    options.add_argument('--no-sandbox')
    options.add_argument('start-maximized')
    options.add_argument('enable-automation')
    options.add_argument('--disable-infobars')
    options.add_argument('--disable-dev-shm-usage')
    options.add_argument('--disable-browser-side-navigation')
    options.add_argument("--remote-debugging-port=9222")
    # options.add_argument("--headless")
    options.add_argument('--disable-gpu')
    options.add_argument("--log-level=3")
    driver = webdriver.Chrome(ChromeDriverManager().install(), chrome_options=options)

Since it was working nicely without headless on my local computer, I figured emulate a real display might do the work aswell. I do not really understand why, but from what I've understood, CloudFare tries to execute javascript code in order to confirm you're not a bot. Having a emulated webpage display helps to do so.

Jules Civel
  • 449
  • 2
  • 13
1

It is because the browser uses cloudfare to protect itself from DDOS (Distributed Denial Of Service) Attacks. There are 2 ways to solve this problem:

  1. Use time.sleep -- if it takes 5 seconds for the webpage to load, just use time.sleep(5).

  2. Use WebDriverWait -- for example, a button with id "sample-btn" appears only after this screen. Then what u can do is:

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By

btn = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.ID, 'sample-btn'))) #Web driver waits for 10 seconds until element is visible

The 2nd one is recommended. But if the 2nd one doesn't work for u, then go with the first one. Hope that this helps!

Sushil
  • 5,440
  • 1
  • 8
  • 26
  • 1
    I appreciate your contribution, unfortunately, neither approach works. See, even if I kill the program (while keeping the browser open), I can't manually load any of these pages myself (just stuck at the same place). Further confirmed by opening additional tabs that are not at any point controlled by selenium that also remains stuck. Any browser that was not initiated by Selenium works normal though. – gloomyfit Oct 02 '20 at 05:31
  • try using headers and user-agents – Abdul Rauf Oct 02 '20 at 09:27
  • 1
    Adding a user-agent did not work. Unsure what you mean regarding adding headers to Selenium. – gloomyfit Oct 03 '20 at 03:30