I am trying to scrape data from a website using Selenium and phantomjs in python. However, this website is adding the data I'm interested in via javascript. Is there a way to ask Selnium to wait for the data before return it? So far, we've tried:
import contextlib
import selenium.webdriver as webdriver
import selenium.webdriver.support.ui as ui
phantomjs = '/usr/local/bin/phantomjs'
url = '[redacted]'
with contextlib.closing(webdriver.PhantomJS(phantomjs)) as driver:
driver.get(url)
wait = ui.WebDriverWait(driver, 10)
wait.until(lambda driver: driver.execute_script("return document.getElementById(\"myID\").innerText").startswith('[redacted]'))
driver.execute_script("return document.getElementById(\"myID\").innerText")
Unfortunately, this code raises selenium.common.exceptions.TimeoutException: Message: None
because the content of the id
we're getting doesn't change.
We are using PhantomJS 1.9.7, python 2.7.5 in a virtualenv and selenium 2.41.0. Is it the right way to do this or are we missing something. Does anyone has a better method to do this?
Thanks in advance.
EDIT
Following @ExperimentsWithCode comment, we tried looping until the content is loaded:
with contextlib.closing(webdriver.PhantomJS(phantomjs)) as driver:
driver.get(url)
wait = ui.WebDriverWait(driver, 10)
found = False
while not found:
try:
wait.until(lambda driver: driver.execute_script("return document.getElementById(\"myID\").innerText").startswith('[redacted]'))
driver.execute_script("return document.getElementById(\"myID\").innerText")
found = True
except:
print "Not found"
pass