7

I want to use selenium/webdriver to simulate a browser and scrape some website-content with it. Even if its not the fastest method, for me it has many advantages such as executing scripts etc.

For many websites it is forbidden to access them via an automated method, for example search engines like google or bing.

For one tool i need to scrape the estimated resultstat from google for several keywords. This will look like the following: simulate the browser that visits google.com and types in a keyword and scrapes the results, then after a little pause type in the next keyword, scrape the results and so on...

My question is: Is it possible for a website to recognize that I'm using selenium to simulate the browser instead of using the browser by hand? Especially the google case gives me some doubts. I know selenium is partly developed by google or at least by some guys working for google. So does leave selenium some fingerprints or isn't it possible to decide if I'm using the browser by myself or simulated by selenium, even for google?

Artjom B.
  • 61,146
  • 24
  • 125
  • 222
zwieback86
  • 387
  • 3
  • 7
  • 14

3 Answers3

3

No, nobody can actually see that you're using Selenium and not hand-operating the browser yourself with WebDriver. I'm not sure about the old Selenium RC, but it should be the same way. Here's how it works:

  1. Selenium opens up a browser with a clean profile (or with a profile you selected)
  2. Selenium is hooked up to the browser so it can steer it, control it. But the browser still does most of the work. Basically, Selenium replaces the user inputs to the browser, but not more.

You can easily verify this by reading the contents of the HTTP headers sent by your browser.

If you ever actually needed Selenium to be recognized by your server, you can use Browsermob-proxy and add a custom header to your requests.


All that said, there is one thing you must be aware of. While there's no way to detect Selenium directly, there can be some indirect clues picked up by the website you're visiting. Those usually include scanning for too many requests made in virtually no time - this might be an issue for you. Make sure your Selenium is behaving like a user.


EDIT 2016/04:

Apparanetly it is possible as https://stackoverflow.com/a/33403473/2930045 states that a company can do it. My guess - and it is nothing but a guess - is that they can run some JS that Selenium installs into the browser to operate.

Community
  • 1
  • 1
Petr Janeček
  • 37,768
  • 12
  • 121
  • 145
  • 1
    Thx for your answer it was really clear. Before marking it as accepted i will wait some time, maybe there are some other opinions. Dont get me wrong i like your answer but maybe there are some other invisible methods for recognizing selenium, especially for google, because i can imagine its in their interest to recognize automated browsers. Thx alot!! – zwieback86 Jul 15 '13 at 12:41
  • Hey Slanec, today i tried out the webdriver for firefox and i noticed the "webdriver" text in the lower right corner and i also saw that an addon named "Firefox Webdriver 2.33.0" is installed. That made me somehow suspicious. Youre really sure that a webpage cannot see what addons i use in my firefox? I dont get the sense of this "webdriver" sign in the status bar? – zwieback86 Jul 17 '13 at 21:16
  • @zwieback86 No, nobody can detect your Firefox plugin (unless your plugin makes this deliberately possible). See http://stackoverflow.com/questions/5067375/detecting-my-own-firefox-extension-from-a-webpage and/or http://webdevwonders.com/detecting-firefox-add-ons/. – Petr Janeček Jul 20 '13 at 15:41
  • 1
    @Slanec, this answer is no longer 100% true. For most cases, it's true, but not all. http://stackoverflow.com/a/33403473/2930045 Looks like a company offers a service blocking selenium-based bots. – RattleyCooper Apr 04 '16 at 17:53
1

Signs point to yes, sites are able to regonize that you are using Selenium.
Counter Example: www.stubhub.com detects and blocks my browser instance launched using Selenium while "normal" browsing done manually (not using the browser launched by the Selenium web driver) work with out issue.

See this stackoverflow question for additional details Can a website detect when you are using selenium with chromedriver?

Community
  • 1
  • 1
Brian Cain
  • 946
  • 1
  • 7
  • 20
0

Yes. The webdriver literally says that it is a webdriver by default. For example in Javascript, you can run navigator.webdriver and it will return true if and only if you are using a webdriver. There are some basic ways to prevent this, though. For example the python module undetected_chromedriver, it is plug and play.

As a side note, if you start botting a lot google will start fingerprinting your entire device via the normal fingerprinting methods. And this is not prevented by undetected_chromedriver. They use exactly the same methods as in a normal browser. Because of this, google might start blocking your normal browser activities. For example, your GPU could be determined by the difference in font rendering. Read more about this on the wikipedia, although take everything you read there with a grain of salt. I have read some weird things on there. https://en.wikipedia.org/wiki/Device_fingerprint

obe
  • 3
  • 2