40

I am running the Chrome driver over Selenium on a Ubuntu server behind a residential proxy network. Yet, my Selenium is being detected. Is there a way to make the Chrome driver and Selenium 100% undetectable?

I have been trying for so long I lost track of the many things I have done including:

  1. Trying different versions of Chrome
  2. Adding several flags and removing some words from the Chrome driver file.
  3. Running it behind a proxy (residential ones also) using incognito mode.
  4. Loading profiles.
  5. Random mouse movements.
  6. Randomising everything.

I am looking for a true version of Selenium that is 100% undetectable. If that ever existed. Or another automation way that is not detectable by bot trackers.

This is part of the starting of the browser:

sx = random.randint(1000, 1500)
sn = random.randint(3000, 4500)

display = Display(visible=0, size=(sx,sn))
display.start()


randagent = random.randint(0,len(useragents_desktop)-1)

uag = useragents_desktop[randagent]
#this is to prevent ip leaking
preferences =
    "webrtc.ip_handling_policy" : "disable_non_proxied_udp",
    "webrtc.multiple_routes_enabled": False,
    "webrtc.nonproxied_udp_enabled" : False

chrome_options.add_experimental_option("prefs", preferences)
chrome_options.add_argument("--disable-dev-shm-usage")
chrome_options.add_argument("--no-sandbox")
chrome_options.add_argument("--disable-impl-side-painting")
chrome_options.add_argument("--disable-setuid-sandbox")
chrome_options.add_argument("--disable-seccomp-filter-sandbox")
chrome_options.add_argument("--disable-breakpad")
chrome_options.add_argument("--disable-client-side-phishing-detection")
chrome_options.add_argument("--disable-cast")
chrome_options.add_argument("--disable-cast-streaming-hw-encoding")
chrome_options.add_argument("--disable-cloud-import")
chrome_options.add_argument("--disable-popup-blocking")
chrome_options.add_argument("--ignore-certificate-errors")
chrome_options.add_argument("--disable-session-crashed-bubble")
chrome_options.add_argument("--disable-ipv6")
chrome_options.add_argument("--allow-http-screen-capture")
chrome_options.add_argument("--start-maximized")

wsize = "--window-size=" +  str(sx-10) + ',' + str(sn-10)
chrome_options.add_argument(str(wsize) )

prefs = {"profile.managed_default_content_settings.images": 2}
chrome_options.add_experimental_option("prefs", prefs)

chrome_options.add_argument("blink-settings=imagesEnabled=true")
chrome_options.add_argument("start-maximized")
chrome_options.add_argument("user-agent="+uag)
chrome_options.add_extension(pluginfile)#this is for the residential proxy
driver = webdriver.Chrome(executable_path="/usr/bin/chromedriver", chrome_options=chrome_options)
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Grman
  • 403
  • 1
  • 5
  • 5
  • 2
    Have you looked at [this question](https://stackoverflow.com/questions/33225947/can-a-website-detect-when-you-are-using-selenium-with-chromedriver) to find more information? It will differ per website how and if they do bot detection. They might work via JavaScript or they might be checking stuff on the server, some automation tools also set an appropiate user-agent string for example. – Bob Jun 10 '19 at 14:57
  • I did 99% of that and a ton more things nothing is working .. – Grman Jun 10 '19 at 15:20
  • did you try to override the userAgent? are you using Headless browser? – Adi Ohana Jun 10 '19 at 15:25
  • simple answer. No, it's not possible – Corey Goldberg Jun 10 '19 at 16:43
  • Possible duplicate of [Can a website detect when you are using selenium with chromedriver?](https://stackoverflow.com/questions/33225947/can-a-website-detect-when-you-are-using-selenium-with-chromedriver) – Corey Goldberg Jun 10 '19 at 16:46
  • Java Robot might be a solution. ( https://docs.oracle.com/javase/7/docs/api/java/awt/Robot.html ) I wouldn't think that could be detected server-side, but I really don't know. Using Selenium would be a moving target as companies who detect constantly change their tactics to detect. You might consider being a good internet citizen and not automating sites that don't want it done.... or asking permission first. – pcalkins Jun 10 '19 at 17:35

3 Answers3

59

The fact that selenium driven WebDriver gets detected doesn't depends on any specific Selenium, Chrome or ChromeDriver version. The Websites themselves can detect the network traffic and can identify the Browser Client i.e. Web Browser as WebDriver controled.

However some generic approaches to avoid getting detected while web-scraping are as follows:

@Antoine Vastel in his blog site Detecting Chrome Headless mentioned several approaches, which distinguish the Chrome browser from a headless Chrome browser.

  • User agent: The user agent attribute is commonly used to detect the OS as well as the browser of the user. With Chrome version 59 it has the following value:

    Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/59.0.3071.115 Safari/537.36
    
    • A check for the presence of Chrome headless can be done through:

      if (/HeadlessChrome/.test(window.navigator.userAgent)) {
          console.log("Chrome headless detected");
      }
      
  • Plugins: navigator.plugins returns an array of plugins present in the browser. Typically, on Chrome we find default plugins, such as Chrome PDF viewer or Google Native Client. On the opposite, in headless mode, the array returned contains no plugin.

    • A check for the presence of Plugins can be done through:

      if(navigator.plugins.length == 0) {
          console.log("It may be Chrome headless");
      }
      
  • Languages: In Chrome two Javascript attributes enable to obtain languages used by the user: navigator.language and navigator.languages. The first one is the language of the browser UI, while the second one is an array of string representing the user’s preferred languages. However, in headless mode, navigator.languages returns an empty string.

    • A check for the presence of Languages can be done through:

      if(navigator.languages == "") {
           console.log("Chrome headless detected");
      }
      
  • WebGL: WebGL is an API to perform 3D rendering in an HTML canvas. With this API, it is possible to query for the vendor of the graphic driver as well as the renderer of the graphic driver. With a vanilla Chrome and Linux, we can obtain the following values for renderer and vendor: Google SwiftShader and Google Inc.. In headless mode, we can obtain Mesa OffScreen, which is the technology used for rendering without using any sort of window system and Brian Paul, which is the program that started the open source Mesa graphics library.

    • A check for the presence of WebGL can be done through:

      var canvas = document.createElement('canvas');
      var gl = canvas.getContext('webgl');
      
      var debugInfo = gl.getExtension('WEBGL_debug_renderer_info');
      var vendor = gl.getParameter(debugInfo.UNMASKED_VENDOR_WEBGL);
      var renderer = gl.getParameter(debugInfo.UNMASKED_RENDERER_WEBGL);
      
      if(vendor == "Brian Paul" && renderer == "Mesa OffScreen") {
          console.log("Chrome headless detected");
      }
      
    • Not all Chrome headless will have the same values for vendor and renderer. Others keep values that could also be found on non headless version. However, Mesa Offscreen and Brian Paul indicates the presence of the headless version.

  • Browser features: Modernizr library enables to test if a wide range of HTML and CSS features are present in a browser. The only difference we found between Chrome and headless Chrome was that the latter did not have the hairline feature, which detects support for hidpi/retina hairlines.

    • A check for the presence of hairline feature can be done through:

      if(!Modernizr["hairline"]) {
          console.log("It may be Chrome headless");
      }
      
  • Missing image: The last on our list also seems to be the most robust, comes from the dimension of the image used by Chrome in case an image cannot be loaded. In case of a vanilla Chrome, the image has a width and height that depends on the zoom of the browser, but are different from zero. In a headless Chrome, the image has a width and an height equal to zero.

    • A check for the presence of Missing image can be done through:

      var body = document.getElementsByTagName("body")[0];
      var image = document.createElement("img");
      image.src = "http://iloveponeydotcom32188.jg";
      image.setAttribute("id", "fakeimage");
      body.appendChild(image);
      image.onerror = function(){
          if(image.width == 0 && image.height == 0) {
          console.log("Chrome headless detected");
          }
      }   
      

References

You can find a couple of similar discussions in:


tl; dr

undetected Selenium
  • 183,867
  • 41
  • 278
  • 352
  • hi , thanks for the info I have done most of this stuff and more also . still I am getting detected . is there a link to a script that has all of this things implemented that I can use as a guide ? – Grman Jun 10 '19 at 16:26
  • 2
    selenium identifies itself: https://w3c.github.io/webdriver/#dom-navigatorautomationinformation-webdriver – Corey Goldberg Jun 10 '19 at 16:48
  • Hi, sorry for my "noob" question but what's the point of changing UserAgent if a bot keep using the same iP address? In this case, wouldn't changing the UserAgent makes the Selenium bot more suspicious and prone to be blocked by website? – Upchanges Mar 01 '22 at 00:29
14

why not try undetected-chromedriver?

Optimized Selenium Chromedriver patch which does not trigger anti-bot services like Distill Network / Imperva / DataDome / Botprotect.io Automatically downloads the driver binary and patches it.

Tested until current chrome beta versions Works also on Brave Browser and many other Chromium based browsers Python 3.6++

you can install it with: pip install undetected-chromedriver

There are important things you should be ware of: Due to the inner workings of the module, it is needed to browse programmatically (ie: using .get(url) ). Never use the gui to navigate. Using your keybord and mouse for navigation causes possible detection! New Tabs: same story. If you really need multi-tabs, then open the tab with the blank page (hint: url is data:, including comma, and yes, driver accepts it) and do your thing as usual. If you follow these "rules" (actually its default behaviour), then you will have a great time for now.

In [1]: import undetected_chromedriver as uc
In [2]: driver = uc.Chrome()
In [3]: driver.execute_script('return navigator.webdriver')
Out[3]: True  # Detectable
In [4]: driver.get('https://distilnetworks.com') # starts magic
In [4]: driver.execute_script('return navigator.webdriver')
In [5]: None  # Undetectable!
hans
  • 325
  • 2
  • 9
  • 1
    Issue log for multi tab here: https://github.com/ultrafunkamsterdam/undetected-chromedriver/issues/27 – Lam Yip Ming Jul 27 '21 at 16:35
  • I had been manually modifying chromedriver to be undetectible, but then citi.com stopped allowing me to log in. This is able to allow me to log in to citi again. And is easier than doing it yourself. – poleguy Sep 19 '21 at 15:45
  • Not working for Google patents Download button :( (429) – Petar Ulev Jun 08 '22 at 13:05
  • 1
    How does this package compare to [selenium-stealth](https://pypi.org/project/selenium-stealth/)? – Andreas L. Oct 30 '22 at 09:50
-1

What about:

import random
from selenium import webdriver
import time
driver = webdriver.Chrome("C:\\Users\\DusEck\\Desktop\\chromedriver.exe")
username = "username"  # data_user
password = "password"  # data_pass
driver.get("https://www.depop.com/login/")  # get URL
driver.find_element_by_xpath('/html/body/div[1]/div/div[3]/div[2]/button[2]').click()  # Accept cookies

split_char_pw = []  # Empty lists
split_char = []
n = 1  # Splitter
for index in range(0, len(username), n):
    split_char.append(username[index: index + n])

for user_letter in split_char:
    time.sleep(random.uniform(0.1, 0.8))
    driver.find_element_by_id("username").send_keys(user_letter)

for index in range(0, len(password), n):
    split_char.append(password[index: index + n])


for pw_letter in split_char_pw:
    time.sleep(random.uniform(0.1, 0.8))
    driver.find_element_by_id("password").send_keys(pw_letter)
  • A little more description regarding the rationale of your solution would help understand it better. – Andreas L. Oct 31 '22 at 07:49
  • Your code mightl solve the problem of getting detected because of typing-speed, but not how to make the driver undetectable for websites like cloudfare. – kaliiiiiiiii Jan 31 '23 at 18:07