The page I'm trying to scrape is http://zipatlas.com/us/oh/zip-code-comparison/population-below-poverty-level.1.htm
It loads some content through javascript, so I'm trying to use the expected_conditions module in selenium to detect it. What happens is that I apparently detect the element I'm looking for, but when I print the page source, it doesn't contain that element. There's a link labeled "TEST LINK" at the bottom of the page, so I figured if that has loaded, the rest of the page pretty much has also.
Here is my code:
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.common.exceptions import TimeoutException
curr_url = r"http://zipatlas.com/us/oh/zip-code-comparison/population-below-poverty-level.1.htm"
driver = webdriver.Firefox()
driver.get(curr_url)
try:
myElem = WebDriverWait(driver, 30).until(EC.presence_of_element_located((By.LINK_TEXT, 'TEST LINK')))
except TimeoutException:
print("took too long to load")
print("element detected")
elem = driver.find_element_by_link_text('TEST LINK')
html = elem.get_attribute("outerHTML")
print(html)
print(driver.page_source)
driver.close()
I do successfully print out the detected element as <a href="">TEST LINK</a>
However, in the page_source that is printed out, I cannot find this. The page source is located here. I also tried using other expected_conditions like element_to_be_clickable
So my question is why is the located element not appearing in the page source? Also, is there any other way to detect that the whole page has loaded? Using expected_conditions is really the only potential solution I found.