1

So I recently started testing selenium for some personal projects and one problem I ran into was being banned from some websites due to recaptcha v3 tests. I did some more research and found the recaptcha v3 demo and did some testing and eventually wrote this:

from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from webdriver_manager.chrome import ChromeDriverManager

options = webdriver.ChromeOptions()
options.add_argument("start-maximized")
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
options.add_argument("--user-agent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.117 Safari/537.36");

driver = webdriver.Chrome(options=options, executable_path=ChromeDriverManager().install())
driver.execute_cdp_cmd("Page.addScriptToEvaluateOnNewDocument", {
  "source": """
    Object.defineProperty(navigator, 'webdriver', {
      get: () => undefined
    })
  """
})

driver.get("https://recaptcha-demo.appspot.com/recaptcha-v3-request-scores.php")
WebDriverWait(driver, 10).until(EC.title_contains("Index"))

I have looked at various stack overflow questions including the following,

Selenium webdriver: Modifying navigator.webdriver flag to prevent selenium detection

Can a website detect when you are using selenium with chromedriver?

How does recaptcha 3 know I'm using selenium/chromedriver?

and more

While the arguments added do help to improve the recaptcha v3 score, it is still extremely inconsistent. about half the time I receive a passing score of .7 and the other half I receive a failing score of .1.

Please help me to improve my recaptcha scores and consistently pass

EDIT 1: Signing into a google account in the chrome instance often changes the results of the demo, however still do not entirely prevent failing scores

Chris Yun
  • 97
  • 1
  • 2
  • 11
  • 1
    The whole point of Recaptcha is to prevent automation. Perhaps the inconsistent Recaptcha score means Recaptcha is actually working as intended. – CEH Jan 20 '20 at 21:42
  • @Christine I understand this but the whole point of this project is to find a way around recaptcha so i can continue to scrape and navigate the recaptcha protected pages – Chris Yun Jan 20 '20 at 21:45
  • 1
    please be a good internet citizen... if the site doesn't want you scraping, do not scrape it. It's likely the collection of data there is the site owner's protected intellectual property and you could be breaking the law by attempting to create a whole copy of it. – pcalkins Jan 20 '20 at 21:58
  • @pcalkins i have no harmful intentions nor am I copying anything this entire project was for educational purposes. However, with the introduction of recaptcha i have become increasingly curious on how to bypass it and how it works – Chris Yun Jan 20 '20 at 22:35
  • some of the new captchas capture behavior data from different parts of the site to build a sort of profile of the user. So it's not just a score resulting from a single page or hit to a site, but from a pattern of behavior... Some site's will just detect or prevent webdriver straight away by checking for script injection. (I think they store a sort of "clean state" hash and check that.) – pcalkins Jan 20 '20 at 22:51
  • @pcalkins I have heard about this but many have told me that it is connected to the google account activity, in all my tests i haven't been signed into google, and when i run the tests on a normal chrome browser i consistently receive .7 or .9 – Chris Yun Jan 20 '20 at 23:43

3 Answers3

1

To increase your scrore from .7 to higher levels i.e. .9 or so you can rotate through execute_cdp_cmd() as follows:

driver.execute_cdp_cmd("Network.setExtraHTTPHeaders", {"headers": {"User-Agent": "browserClientA"}})

In case there is a necessity you can add multiple as follows:

driver.execute_cdp_cmd("Network.setExtraHTTPHeaders", {"headers": {"User-Agent": "browserClientA"}})
driver.execute_cdp_cmd("Network.setExtraHTTPHeaders", {"headers": {"User-Agent": "browserClientB"}})
driver.execute_cdp_cmd("Network.setExtraHTTPHeaders", {"headers": {"User-Agent": "browserClientC"}})

Solution

So effectively your working solution would be:

  • Code Block:

    from selenium import webdriver
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    
    options = webdriver.ChromeOptions() 
    options.add_argument("start-maximized")
    options.add_experimental_option("excludeSwitches", ["enable-automation"])
    options.add_experimental_option('useAutomationExtension', False)
    driver = webdriver.Chrome(options=options, executable_path=r'C:\WebDrivers\chromedriver.exe')
    driver.execute_cdp_cmd("Page.addScriptToEvaluateOnNewDocument", {
      "source": """
        Object.defineProperty(navigator, 'webdriver', {
          get: () => undefined
        })
      """
    })
    driver.execute_cdp_cmd("Network.enable", {})
    driver.execute_cdp_cmd("Network.setExtraHTTPHeaders", {"headers": {"User-Agent": "browser1"}})
    driver.get("https://recaptcha-demo.appspot.com/recaptcha-v3-request-scores.php")
    print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "li.step3 pre.response"))).get_attribute("innerHTML"))
    
  • Console Output:

    DevTools listening on ws://127.0.0.1:53748/devtools/browser/eac086e8-f1c0-42d3-8ef8-d132f4b4c82b
    {
      "success": true,
      "hostname": "recaptcha-demo.appspot.com",
      "challenge_ts": "2020-01-20T22:31:32Z",
      "apk_package_name": null,
      "score": 0.9,
      "action": "examples/v3scores",
      "error-codes": []
    }
    
  • Console Snapshot:

recaptcha3_score

undetected Selenium
  • 183,867
  • 41
  • 278
  • 352
  • 1
    This helps, whenever i first do the demo i always pass, however when refreshing the demo page multiple times on the same chrome instance, the score is still inconsistent, fluctuating between .1 and .7. I am not so much looking to raise the score to a .9 but rather I need help producing a .7 consistently every time you refresh the demo page. Thank You for the help though – Chris Yun Jan 20 '20 at 23:13
  • @ChrisYun As I mentioned, you shouldn't keep the same UA for back to back execution. You have to change it (may be iterate through a list) to keep your score high. – undetected Selenium Jan 20 '20 at 23:17
  • upon second inspection I found that even rotating user agents leads still does not prevent failing scores. Is there any known method of completely avoiding failing scores? – Chris Yun Jan 20 '20 at 23:38
  • Additionally even the first demo is starting to fail. I added a additional section that iterates through random user agents after each demo however i still fail sometimes – Chris Yun Jan 21 '20 at 00:03
  • The downvote placed on your answer is not mine, I simply removed the checkmark. Also I thought that the first demo would always pass however, with more testing the results seemed to fluctuate. While the user agent rotation does drastically reduce the chance of failure, it does not completely eliminate it, which is what I am trying to do – Chris Yun Jan 21 '20 at 16:02
  • @ChrisYun Checkout the updated answer and let me know the ststus. – undetected Selenium Jan 22 '20 at 11:29
  • Changing UA is far too basic to make a difference IMHO. – pguardiario Jan 22 '20 at 12:41
  • Unfortunately, I have already tried this edit and am still faced with failing scores. If anything, this edit made the browsers consistently fail the demo. – Chris Yun Jan 22 '20 at 20:13
0

Nobody really knows except google how they score these. But... we can imagine I think some obvious factors:

  • residential / business ip vs datacenter

  • google / oauth cookies

  • obvious things like user-agent and browser fingerprinting.

HTH.

pguardiario
  • 53,827
  • 19
  • 119
  • 159
0

If you can scrape through pages without javascript, then disabling javascript while you scrape, might do the trick for you.

  • Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Roxy Nov 16 '22 at 15:13