Scraping hidden elements

Question

My question is twofold:

1) I'm attempting to log on to this page, source is here, using the code below. It's fine to use the credentials I've provided, which will expire 28 days from now but it's relatively painless to create a trial account for those viewing this content after that.

from selenium import webdriver
driver_path = 'Path to my downloaded chromedriver.exe file'
url_login = 'https://www.findacode.com/signin.html'
username = 'jd@mailinator.com'
password = 'm%$)-Y95*^.1Gin+'

options = webdriver.ChromeOptions()
options.add_argument('headless')
driver = webdriver.Chrome(executable_path=driver_path, chrome_options=options)

driver.get(url_login)
assert '_submit_check' in driver.page_source
driver.find_element_by_name('id').send_keys(username)
driver.find_element_by_name('password').send_keys(password)
driver.find_element_by_xpath("//input[@value='Sign In']").submit()

I receive the following error for all 3 elements:

selenium.common.exceptions.ElementNotVisibleException: Message: element not visible

My command of html/css/javscript isn't as strong but I've tried using waits per this thread and received a timeout. Was going to try ActionChains from that thread next but love to hear from someone with more knowledge on this about how to proceed.

2) Ultimately I want to scrape specific code history data from this url (source here) by varying the code (last 5 characters of the url) in a loop. A user has to be logged in, hence my first question above, and the way to view the information I'm after in the browser is to expand the light purple "Code History" table. The specific information I'm after is the date from any row where the Action column is 'Added' and the Notes column is 'Code Added':

Date       Action Notes 
2018-01-01 Added  First appearance in code
2017-02-01 Added  Code Added

My question here is since the table, which I believe is hidden, needs to be expanded with a click to expose the data I'm after, how do I proceed?

Edit Here's code, pseudo code and commentary to explain my 2nd question.

url_code = "https://www.findacode.com/code.php?set=CPT&c="
driver.get(url_code+'0001U') # i'm presuming that this will preserve the login session
driver.find_element_by_id('history').click() # i intend for this to expand the Code History section and expose the table shown earlier in the post but it's not doing that
check whether the phrase "Code Added" occurs in page source
if so, grab the date that is in the <td nowrap> tag that is 2 tags to the left

I can use BeautifulSoup for the last two lines if not possible with Selenium but I need help understanding why I'm not seeing the data I want to scrape

score 3 · Answer 1 · answered Aug 22 '18 at 15:48

3

There are two forms on page with inputs @name="id", @name="password" and "Sign in" button. The first one is hidden. You need to handle form with @name="login":

form = driver.find_element_by_name('login')
form.find_element_by_name('id').send_keys(username)
form.find_element_by_name('password').send_keys(password)
form.find_element_by_xpath("//input[@value='Sign In']").submit()

answered Aug 22 '18 at 15:48

Andersson

51,635
17
77
129

Thanks for pointing this out. Still coming up to speed on html. Any thoughts on the 2nd part of my question? – lajulajay Aug 22 '18 at 16:15
There is no code for second part, so issue is not quite clear... If you need to get several URLs changing the page number, you can define `counter = 1` and use `driver.get('https://www.findacode.com/code.php?set=CPT&c=000%sU' % counter)` in a loop + increment `counter` by `1` on each iteration – Andersson Aug 22 '18 at 16:22
I'll try to add some code for that in a bit but my challenge there is more about how I can get to the contents of the table – lajulajay Aug 22 '18 at 16:28
So there is a third issue also? :) It'd be better to create tickets for each issue as it will be hard to analyze the code and search for root-cause – Andersson Aug 22 '18 at 16:30
There isn't a 3rd issue, just a 2nd issue which apparently might not have been as clearly articulated. I'm just clarifying that the 2nd issue is more about getting to the data than looping through the codes. – lajulajay Aug 22 '18 at 16:32

score 1 · Answer 2 · answered Aug 22 '18 at 16:00

To login into this website you need to induce WebDriverWait for the desired elements to be clickable and you can use the following solution:

Code Block:

from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

driver_path = 'Path to my downloaded chromedriver.exe file'
url_login = 'https://www.findacode.com/signin.html'
username = 'jd@mailinator.com'
password = 'm%$)-Y95*^.1Gin+'

options = webdriver.ChromeOptions()
options.add_argument('--headless')
options.add_argument("start-maximized")
options.add_argument('disable-infobars')
driver=webdriver.Chrome(chrome_options=options, executable_path=r'C:\Utility\BrowserDrivers\chromedriver.exe')
driver.get(url_login)
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//form[@name='login']//input[@name='id']"))).send_keys(username)
driver.find_element_by_xpath("//form[@name='login']//input[@name='password']").send_keys(password)
driver.find_element_by_xpath("//form[@name='login']//input[contains(@value,'Sign In')]").submit()
print("Logged In successfully")

Console Output:
```
Logged In successfully
```

Can you explain why OP **need** *to induce WebDriverWait* while it works perfectly without WebDriverWait? — Andersson, Aug 22 '18 at 16:05
@Andersson he is asking in every one of his post to use webdriver wait, I still don't have any idea why he does so! — Rajagopalan, Aug 22 '18 at 16:23

Scraping hidden elements

2 Answers2