extracting links with a specific class with Selenium in Python

Question

I am trying to extract links from a infinite scroll website

It's my code for scrolling down the page

driver = webdriver.Chrome('C:\\Program Files     (x86)\\Google\\Chrome\\chromedriver.exe')
driver.get('http://seekingalpha.com/market-news/top-news')
for i in range(0,2):
    driver.implicitly_wait(15)
    driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
    time.sleep(20)

I aim at extracting specific links from this page. With class = "market_current_title" and HTML like the following :

<a class="market_current_title" href="/news/3223955-dow-wraps-best-week-since-2011-s-and-p-strongest-week-since-2014" sasource="titles_mc_top_news" target="_self">Dow wraps up best week since 2011; S&amp;P in strongest week since 2014</a>

When I used

URL = driver.find_elements_by_class_name('market_current_title')

I ended up with the error that says "stale element reference: element is not attached to the page document". Then I tried

 URL = driver.find_elements_by_xpath("//div[@id='a']//a[@class='market_current_title']")

but it says that there is no such a link !!! Do you have any idea about solving this problem?

score 1 · Accepted Answer · edited May 23 '17 at 12:07

1

You're probably trying to interact with an element that is already changed (probably elements above your scrolling and off screen). Try this answer for some good options on how to overcome this.

Here's a snippet:

from selenium.common.exceptions import TimeoutException
from selenium.webdriver.common.by import By
import selenium.webdriver.support.expected_conditions as EC
import selenium.webdriver.support.ui as ui

# return True if element is visible within 2 seconds, otherwise False
def is_visible(self, locator, timeout=2):
try:
    ui.WebDriverWait(driver, timeout).until(EC.visibility_of_element_located((By.CSS_SELECTOR, locator)))
    return True
except TimeoutException:
    return False

edited May 23 '17 at 12:07

Community

1
1

answered Nov 13 '16 at 08:51

Moshisho

2,781
1
23
39

Thanks Mashisho, those answers implemented mainly in Java, Java Script and C#. I could't get the same solution in Python . – mk_sch Nov 13 '16 at 12:20
That seems very nice, I tried it like : "elements = wait.until(driver.find_elements_by_class_name('market_current_title'))" but I got an error that says : " 'list' object is not callable ". Too strange !!! – mk_sch Nov 14 '16 at 06:31
I am wondering if each time I scroll down the page and grab the links and scroll down once again and get the new links, in that respect I think I shouldn't face this problem. Do you have any idea how to do that ? – mk_sch Nov 14 '16 at 06:43
The "list object.." error is probably because you can't do `wait.until` on a list, do it for each element (and not element*s*), then scroll again and take the new links until you're on the end of the scroll. Use a while loop. BTW, if my answer is helpful you can vote it... – Moshisho Nov 14 '16 at 13:50
Thank you Moshisho, of course your answers were very helpful, as I am new to programming, could you give me some hints as a code of how to scroll and take the links each time? I am doing " for i in range(0,2): driver.implicitly_wait(15) driver.execute_script("window.scrollTo(0, document.body.scrollHeight)" to scroll down , I dont know how to do it one time and get links and then go on . – mk_sch Nov 14 '16 at 16:07
I'm not near a PC now but the idea is to use while(scrollingIsPossible) { scroll; addNewLinks; } – Moshisho Nov 14 '16 at 16:35
I really appreciate your help, I would be grateful if you could provide me with more details later on, its part of my research which I got stuck into for many days, any help would be a very nice to me. – mk_sch Nov 14 '16 at 16:42
Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/128081/discussion-between-farshidbalan-and-moshisho). – mk_sch Nov 14 '16 at 17:09

extracting links with a specific class with Selenium in Python

1 Answers1