1

I am trying to use Selenium in Python to pull some data from https://www.seekingalpha.com. The front page has a "Sign-in/Join now" link. I used Selenium to click it, which brought up a popup asking for sign-in information with another "Sign in" button. It seems my code below can enter my username and password, but my attempt to click the "sign in" button didn't get the right response (it clicked on the ad below the popup box.)

I am using Python 3.5.

Here is my code:

driver = webdriver.Chrome()

url = "https://seekingalpha.com"

driver.get(url)

sleep(5)

driver.find_element_by_xpath('//*[@id ="sign-in"]').click() 

sleep(5)

driver.find_element_by_xpath('//*[@id ="authentication_login_email"]').send_keys("xxxx@gmail.com") 

driver.find_element_by_xpath('//*[@id ="authentication_login_password"]').send_keys("xxxxxxxxx") 

driver.find_element_by_xpath('//*[@id="log-btn"]').click()

Any advice/suggestion is greatly appreciated.

Trapli
  • 1,517
  • 2
  • 13
  • 19
Bin Chen
  • 13
  • 1
  • 6
  • 1
    sending the enter key in the password field usually works... – pcalkins Sep 03 '19 at 19:39
  • Thank you pcalkins and Trapli. I followed your suggestions and tried both approaches (driver.execute_script or send the ENTER key to pw), but neither is working. I do noticed that even if I clicked the "Sign in" button manually in the browser brought up by driver.get("https://seekingalpha.com"), there was no response. However, if I opened seekingalpha.com in a Chrome browser manually, and manually brought up the authentication popup, then manually clicked the "sign in " button, it responded. Why is that? Do I need to somehow set the simulated control to the popup? – Bin Chen Sep 03 '19 at 20:33

2 Answers2

2

EDIT: previous 'answer' was wrong so I have updated it.

Got you man, this is what you need to do:
1.) grab the latest firefox
2.) grab the latest geckodriver
3.) use a firefox driver

driver = webdriver.Firefox(executable_path=r'd:\Python_projects\geckodriver.exe')

url = "https://seekingalpha.com"

driver.get(url)

sign_in = driver.find_element_by_xpath('//*[@id ="sign-in"]')
driver.execute_script('arguments[0].click()', sign_in)
time.sleep(1)

email = driver.find_element_by_xpath('//*[@id ="authentication_login_email"]')
email.send_keys("xxxx@gmail.com")
pw = driver.find_element_by_xpath('//*[@id ="authentication_login_password"]')
pw.send_keys("xxxxxxxxx")
pw.send_keys(Keys.ENTER)

Explanation:

It is easy to detect if selenium is used or not if the browser tells that information (and it seems this page does not want to be scraped):

The webdriver read-only property of the navigator interface indicates whether the user agent is controlled by automation.

I have looked for an answer how to bypass detection and found this article.

Your best of avoiding detection when using Selenium would require you to use one of the latest builds of Firefox which don’t appear to give off any obvious sign that you are using Firefox.

Gave a shot and after launch the correct page design loaded and the login attempt resulted the same like the manual attempt.

Also with a bit more searching found that if you modify your chromedriver, you have a chance to bypass detection even with chromedriver.

Learned something new today too. \o/

An additional idea:

I have made a little experiment using embedded chromium (CEF). If you open a chrome window via selenium and you open the console and check navigator.webdriver the result will be True. If you open a CEF window however and then remote debug it, the flag will be False. I did not check edge cases with it but non-edge-case scenarios should be fine with CEF.

So what you may want to check out in the future:

1.) in command line: pip install cefpython3
2.) git clone https://github.com/cztomczak/cefpython.git
3.) open your CEF project and find hello.pyin the examples
4.) update the startup to cef.Initialize(settings={"remote_debugging_port":9222})
5.) run hello.py
(this was the initial, one time setup, you may customize it in the future, but the main thing is done, you have a browser with a debug port open)
6.) modify chrome startup to:

from selenium.webdriver.chrome.options import Options

chrome_options = Options()
chrome_options.debugger_address = "127.0.0.1:9222"
driver = webdriver.Chrome(chrome_options=chrome_options, executable_path=chrome_driver_executable)

7.) now you have a driver without 'automated' signature in the browser. There may be some drawbacks like:

  • CEF is not super very latest, right now the latest released chrome is v76, CEF is v66.
  • also "some stuff" may not work, like window.Notification is not a thing in CEF
Trapli
  • 1,517
  • 2
  • 13
  • 19
  • Thank you pcalkins and Trapli. I followed your suggestions and tried both approaches (driver.execute_script or send the ENTER key to pw), but neither is working. I do noticed that even if I clicked the "Sign in" button manually in the browser brought up by driver.get("seekingalpha.com"), there was no response. However, if I opened seekingalpha.com in a Chrome browser manually, and manually brought up the authentication popup, then manually clicked the "sign in " button, it responded. Why is that? Do I need to somehow set the simulated control to the popup? – Bin Chen Sep 03 '19 at 20:34
  • @Trapli Thank you very much!!! This works beautifully. And it is much more straightforward than to modify the chromedriver. – Bin Chen Sep 04 '19 at 04:18
  • @Trapli, what is the options() in the line "chrome_options = Options()"? Thanks. – Bin Chen Sep 18 '19 at 16:58
  • @BinChen `from selenium.webdriver.chrome.options import Options` – Trapli Sep 19 '19 at 08:33
  • added to answer too – Trapli Sep 19 '19 at 12:26
2

I tried code you provided and it works fine. i added selenium wait just to check other options and those also worked well i changed 2 lines instead of sleeps

driver.get(url)
wait = WebDriverWait(driver, 10)
signin = wait.until(EC.element_to_be_clickable((By.XPATH, "//*[@id ='sign-in']")))
#sleep(5)

signin.click()

#driver.find_element_by_xpath('//*[@id ="sign-in"]').click()

#sleep(5)
wait.until(EC.element_to_be_clickable((By.XPATH, "//*[@id ='authentication_login_email']")))
driver.find_element_by_xpath('//*[@id ="authentication_login_email"]').send_keys("xxxx@gmail.com")

and it does click on Sign in button. and what i found is there is captcha handling on the site when i checked console after clicked on sign in button it tell the story. I went ahead and added user agent to your script but it did not worked as well. Notice the blockscript parameter in response of login API and console errors in below screenshots. However there is no captcha on the ui - console error

api error

Dev
  • 2,739
  • 2
  • 21
  • 34
  • Each and every aspect mentioned in the answer is just perfect. Though the blocking issue still not conclusive to me. – undetected Selenium Sep 03 '19 at 21:04
  • @DebanjanB when script clicks on the sign in button no error message on ui and it shows no change – Dev Sep 03 '19 at 21:10
  • Correct, I am not sure why Chrome is complaining _...resource interpreted as document but transferred with mime type image/gif..._, still clueless. – undetected Selenium Sep 03 '19 at 21:12
  • 1
    @dev thanks for pointing out that I was totally misinterpreted the problem. After checking the page a bit more thoroughly, even the page design is different depending on how it was opened. – Trapli Sep 03 '19 at 21:16
  • I think I have an answer for the problem. I have updated my original answer. – Trapli Sep 03 '19 at 22:13
  • 1
    @dev Thank you for diagnosing the cause. Great finding! – Bin Chen Sep 04 '19 at 04:20