I'm scraping real estate data. On sites generated with javascript Selenium does a splendid job: you find the tags that hold the relevant information and loop over all of them with
driver.find_elements_by...
But on this site , the listings are produced by angular js. I tried the same approach:
for article in driver.find_elements_by_css_selector("div.property.ng-scope"):
do something
I figured out that I have to make my webdriver (phantomJS) click the link leading to the individual listings' site:
linkbase = article.find_element_by_css_selector("div.info.clear.ng-scope")
link = linkbase.find_element_by_tag_name('a')
link.click()
Then the webdriver is simply pointed towards that site and I can get all the information I want for one listing.
As soon as one run through the loop ends, I get the following error:
> Message: {"errorMessage":"Element does not exist in cache","request":{"headers":
{"Accept":"application/json","Accept-Encoding":"identity","Connection":"close","
Content-Length":"142","Content-Type":"application/json;charset=UTF-8","Host":"12
7.0.0.1:56577","User-Agent":"Python-urllib/3.4"},"httpVersion":"1.1","method":"P
OST","post":"{\"sessionId\": \"f9ec2c10-dfd9-11e5-9d4c-3bbe8f5bf7c0\", \"using\"
: \"css selector\", \"id\": \":wdc:1456856343349\", \"value\": \"div.info.clear.
ng-scope\"}","url":"/element","urlParsed":{"anchor":"","query":"","file":"elemen
t","directory":"/","path":"/element","relative":"/element","port":"","host":"","
password":"","user":"","userInfo":"","authority":"","protocol":"","source":"/ele
ment","queryKey":{},"chunks":["element"]},"urlOriginal":"/session/f9ec2c10-dfd9-
11e5-9d4c-3bbe8f5bf7c0/element/:wdc:1456856343349/element"}}
The element containing the link on the page is:
<a ng-href="/detail/prodej/dum/rodinny/jemnice-jemnice-/3800125532" ng-click="beforeOpen(i.iterator, i.regionTip)" class="title" href="/detail/prodej/dum/rodinny/jemnice-jemnice-/3800125532">
<span class="name ng-binding"> ... </a>
Which is just the title text of each listing. I did set a user-agent following this answer even though it doesn't appear in the error. Also I wait before the surrounding element is loaded:
wait = WebDriverWait(driver, getSearchResults_CZ.waiting)
wait.until(EC.presence_of_element_located((By.CSS_SELECTOR, "div.content")))
What I want is to parse all these property elements, save their links to a list and then loop through the list, opening each link with driver.get() I know that by clicking the link, the driver url changes, but I thought that once the list of articles has been established with find_elements_by, it would serve as a stable reference point. Accessing the link by searching for the "a" tag and calling get_attribute('href') didn't work in this case with the angular js framework. What am I not seeing?
EDIT: As answered, get_attribute without .click() is the right way to go. My original error was related to the CSS selector: I have been using "div[class^='property']" and got a totally different link. Must have found another element I hadn't seen before.