EDIT: previous 'answer' was wrong so I have updated it.
Got you man, this is what you need to do:
1.) grab the latest firefox
2.) grab the latest geckodriver
3.) use a firefox driver
driver = webdriver.Firefox(executable_path=r'd:\Python_projects\geckodriver.exe')
url = "https://seekingalpha.com"
driver.get(url)
sign_in = driver.find_element_by_xpath('//*[@id ="sign-in"]')
driver.execute_script('arguments[0].click()', sign_in)
time.sleep(1)
email = driver.find_element_by_xpath('//*[@id ="authentication_login_email"]')
email.send_keys("xxxx@gmail.com")
pw = driver.find_element_by_xpath('//*[@id ="authentication_login_password"]')
pw.send_keys("xxxxxxxxx")
pw.send_keys(Keys.ENTER)
Explanation:
It is easy to detect if selenium is used or not if the browser tells that information (and it seems this page does not want to be scraped):
The webdriver read-only property of the navigator interface indicates whether the user agent is controlled by automation.
I have looked for an answer how to bypass detection and found this article.
Your best of avoiding detection when using Selenium would require you to use one of the latest builds of Firefox which don’t appear to give off any obvious sign that you are using Firefox.
Gave a shot and after launch the correct page design loaded and the login attempt resulted the same like the manual attempt.
Also with a bit more searching found that if you modify your chromedriver, you have a chance to bypass detection even with chromedriver.
Learned something new today too. \o/
An additional idea:
I have made a little experiment using embedded chromium (CEF). If you open a chrome window via selenium and you open the console and check navigator.webdriver
the result will be True
. If you open a CEF window however and then remote debug it, the flag will be False
. I did not check edge cases with it but non-edge-case scenarios should be fine with CEF.
So what you may want to check out in the future:
1.) in command line: pip install cefpython3
2.) git clone https://github.com/cztomczak/cefpython.git
3.) open your CEF project and find hello.py
in the examples
4.) update the startup to cef.Initialize(settings={"remote_debugging_port":9222})
5.) run hello.py
(this was the initial, one time setup, you may customize it in the future, but the main thing is done, you have a browser with a debug port open)
6.) modify chrome startup to:
from selenium.webdriver.chrome.options import Options
chrome_options = Options()
chrome_options.debugger_address = "127.0.0.1:9222"
driver = webdriver.Chrome(chrome_options=chrome_options, executable_path=chrome_driver_executable)
7.) now you have a driver without 'automated' signature in the browser. There may be some drawbacks like:
- CEF is not super very latest, right now the latest released chrome is v76, CEF is v66.
- also "some stuff" may not work, like
window.Notification
is not a thing in CEF