25

Is there a way to make your Selenium script undetectable in Python using geckodriver?

I'm using Selenium for scraping. Are there any protections we need to use so websites can't detect Selenium?

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131

4 Answers4

43

There are different methods to avoid websites detecting the use of Selenium.

  1. The value of navigator.webdriver is set to true by default when using Selenium. This variable will be present in Chrome as well as Firefox. This variable should be set to "undefined" to avoid detection.

  2. A proxy server can also be used to avoid detection.

  3. Some websites are able to use the state of your browser to determine if you are using Selenium. You can set Selenium to use a custom browser profile to avoid this.

The code below uses all three of these approaches.

profile = webdriver.FirefoxProfile('C:\\Users\\You\\AppData\\Roaming\\Mozilla\\Firefox\\Profiles\\something.default-release')

PROXY_HOST = "12.12.12.123"
PROXY_PORT = "1234"
profile.set_preference("network.proxy.type", 1)
profile.set_preference("network.proxy.http", PROXY_HOST)
profile.set_preference("network.proxy.http_port", int(PROXY_PORT))
profile.set_preference("dom.webdriver.enabled", False)
profile.set_preference('useAutomationExtension', False)
profile.update_preferences()
desired = DesiredCapabilities.FIREFOX

driver = webdriver.Firefox(firefox_profile=profile, desired_capabilities=desired)

Once the code is run, you will be able to manually check that the browser run by Selenium now has your Firefox history and extensions. You can also type "navigator.webdriver" into the devtools console to check that it is undefined.

CST
  • 747
  • 1
  • 8
  • 12
  • 1
    This was the only one solution that worked for me so far. Really thanks for share it! – btafarelo Sep 25 '20 at 16:13
  • What examples should I try with "DesiredCapabilities"? – Fandango68 Sep 29 '20 at 01:09
  • This solution worked for me, even without proxy. The only one problem was Firfox hanging during opening profile. So I changed first code row on: `profile = webdriver.FirefoxProfile()` – Yuri Jan 04 '21 at 20:39
  • @CST Could you please write the same thing for Google Chrome Browser in python selenium? That would be really helpful. – Harsh Vardhan Jan 07 '21 at 08:41
  • @HarshVardhan I have tried this, but unfortunately it seems this method does not work on Chrome. I believe Google are able to perform extra checks in Chrome that they can't perform in Firefox. There may be another way around reCAPTCHA in Chrome, but this method will not. – CST Jan 19 '21 at 10:19
  • @CST Mate this was the only thing that worked, thanks – Wboy Apr 23 '21 at 16:29
  • it worked for me, but with all the commands above and also Enhanced Tracking Protection OFF set in Firefox. – Cristiana SP Aug 16 '21 at 12:51
  • 3
    I tried this and it worked for older versions of firefox like 78.15.0esr but now that my browser got updated to 91.3.0esr, it doesnt. :( I keep getting 'forbidden request' – cheena Nov 29 '21 at 20:18
12

The fact that selenium driven Firefox / GeckoDriver gets detected doesn't depends on any specific GeckoDriver or Firefox version. The Websites themselves can detect the network traffic and can identify the Browser Client i.e. Web Browser as WebDriver controled.

As per the documentation of the WebDriver Interface in the latest editor's draft of WebDriver - W3C Living Document the webdriver-active flag which is initially set as false, is set to true when the user agent is under remote control i.e. when controlled through Selenium.

NavigatorAutomationInformation

Now that the NavigatorAutomationInformation interface should not be exposed on WorkerNavigator.

mixin NavigatorAutomationInformation

So,

webdriver
    Returns true if webdriver-active flag is set, false otherwise.

where as,

navigator.webdriver
    Defines a standard way for co-operating user agents to inform the document that it is controlled by WebDriver, for example so that alternate code paths can be triggered during automation.

So, the bottom line is:

Selenium identifies itself


However some generic approaches to avoid getting detected while web-scraping are as follows:

undetected Selenium
  • 183,867
  • 41
  • 278
  • 352
  • I am not sure I understand this. Where is this flag exposed? In the http-request? Part of the user-agent string? Can it be changed? – d-b Dec 04 '20 at 10:09
  • @d-b The site can run client-side JavaScript which evaluates the variables and exposes the browser setting. Runs for every visitor, but isn't an issue for legit user activity. – Joel Wigton Oct 19 '21 at 15:24
  • @undetectedSelenium please can u help me with https://stackoverflow.com/questions/72375645/catching-selenium-error-before-it-crashes-in-python?noredirect=1#comment127859410_72375645 – S Mev May 25 '22 at 13:07
1

As per the current WebDriver W3C Editor's Draft specification:

The webdriver-active flag is set to true when the user agent is under remote control. It is initially false.

Hence, the readonly boolean attribute webdriver returns true if webdriver-active flag is set, false otherwise.

Further the specification further clarifies:

navigator.webdriver Defines a standard way for co-operating user agents to inform the document that it is controlled by WebDriver, for example so that alternate code paths can be triggered during automation.


There had been tons and millions of discussions demanding Feature: option to disable navigator.webdriver == true ? and @whimboo in his comment concluded that:

that is because the WebDriver spec defines that property on the Navigator object, which has to be set to true when tests are running with webdriver enabled:

https://w3c.github.io/webdriver/#interface

Implementations have to be conformant to this requirement. As such we will not provide a way to circumvent that.


Generic Conclusion

From the above discussions it can be concluded that:

Selenium identifies itself

and there is no way to conceal the fact that the browser is WebDriver driven.


Recommendations

However some users have suggested approaches which can conceal the fact that the Mozilla Firefox browser is WebDriver controled through the usage of Firefox Profiles and Proxies as follows:

compatible code

from selenium.webdriver import Firefox
from selenium import webdriver
from selenium.webdriver.firefox.service import Service
from selenium.webdriver.firefox.options import Options

profile_path = r'C:\Users\Admin\AppData\Roaming\Mozilla\Firefox\Profiles\s8543x41.default-release'
options=Options()
options.set_preference('profile', profile_path)
options.set_preference('network.proxy.type', 1)
options.set_preference('network.proxy.socks', '127.0.0.1')
options.set_preference('network.proxy.socks_port', 9050)
options.set_preference('network.proxy.socks_remote_dns', False)
service = Service('C:\\BrowserDrivers\\geckodriver.exe')
driver = Firefox(service=service, options=options)
driver.get("https://www.google.com")
driver.quit()

Other Alternatives

It is observed that in some specific variants a couple of diverse settings/configuration can bypass the detectation which are as follows:

compatible code block

from selenium import webdriver
from selenium.webdriver.firefox.options import Options
from selenium.webdriver.chrome.service import Service

options = Options()
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('excludeSwitches', ['enable-logging'])
options.add_experimental_option('useAutomationExtension', False)
options.add_argument('--disable-blink-features=AutomationControlled')
s = Service('C:\\BrowserDrivers\\geckodriver.exe')
driver = webdriver.Chrome(service=s, options=options)

Potential Solution

A potential solution would be to use the browser as follows:

compatible code

from selenium.webdriver import Firefox  
from selenium import webdriver
from selenium.webdriver.firefox.service import Service
from selenium.webdriver.firefox.options import Options
import os

torexe = os.popen(r'C:\Users\username\Desktop\Tor Browser\Browser\TorBrowser\Tor\tor.exe')
profile_path = r'C:\Users\username\Desktop\Tor Browser\Browser\TorBrowser\Data\Browser\profile.default'
firefox_options=Options()
firefox_options.set_preference('profile', profile_path)
firefox_options.set_preference('network.proxy.type', 1)
firefox_options.set_preference('network.proxy.socks', '127.0.0.1')
firefox_options.set_preference('network.proxy.socks_port', 9050)
firefox_options.set_preference("network.proxy.socks_remote_dns", False)
firefox_options.binary_location = r'C:\Users\username\Desktop\Tor Browser\Browser\firefox.exe'
service = Service('C:\\BrowserDrivers\\geckodriver.exe')
driver = webdriver.Firefox(service=service, options=firefox_options)
driver.get("https://www.tiktok.com/")
undetected Selenium
  • 183,867
  • 41
  • 278
  • 352
  • 2
    Thanks for your attention, but I wasn't successful with any of these three methods, using tiktok.com as an example on Linux with Selenium 4.1.3. (Also, it's probably better to edit your existing answer than make a new one.) With method 1 ("Recommendations"), TikTok still detects Selenium. With method 2 ("Other Alternatives"), I get `AttributeError: 'Options' object has no attribute 'add_experimental_option'`; is there a different version of Selenium that supports this? With method 3 ("Potential Solution"), I find that TikTok just returns "Access Denied" unconditionally for Tor. – Kodiologist Mar 29 '22 at 12:33
  • @Kodiologist I just wanted to keep the _`tor`_ example simple with regular Firefox, else with _Firefox Nightly_ evading the detection works just perfecto. – undetected Selenium Mar 29 '22 at 12:37
  • But then how do you use the Tor Browser with Firefox nightly? I imagine that just downloading Firefox nightly and replacing Tor Browser's Firefox executable with the new one wouldn't work. – Kodiologist Mar 29 '22 at 12:47
  • @Kodiologist Why do you feel it won't work? See [this](https://stackoverflow.com/a/53703144/7429447), [this](https://stackoverflow.com/a/62686067/7429447) and [this](https://stackoverflow.com/q/62666075/7429447) Python based examples. – undetected Selenium Mar 29 '22 at 13:10
  • I guess I misunderstood how the configuration works. Anyway, I unfortunately still get "Access Denied". Here's the exact code I used in case it helps: https://paste.rs/inZ.py I think the `popen` line is a no-op, by the way. Thanks for bearing with me. – Kodiologist Mar 29 '22 at 13:44
  • I think in this answer the solution for Chrome and Firefox is mixed up and in general it does not work for Firefox. For example the "Other alternatives" the chrome.service is imported instead of firefox.service and also firefox.options does not have add_experimental_option method (but chrome.options has this method). – eNca Oct 22 '22 at 11:41
  • add_experimental_option works for the website I scrap on linux today. Thanks! – Sebapi Aug 15 '23 at 18:51
-2

It may sound simple, but if you look how the website detects selenium (or bots) is by tracking the movements, so if you can make your program slightly towards like a human is browsing the website you can get less captcha, such as add cursor/page scroll movements in between your operations, and other actions which mimics the browsing. So between two operations try to add some other actions, Add some delay etc. This will make your bot slower and could get undetected.

Thanks

JAbr
  • 312
  • 2
  • 12