Loop through links and scrape data from resulting pages using Selenium + Python

Question

I am new to Selenium and need to scrape a website that contains a list of links structured exactly like:

<a class="unique" href="...">
    <i class="something"></i>
    "Text - "
    <span class="something">Text</span>
</a>
<a class="unique" href="...">
    <i class="something"></i>
    "Text - "
    <span class="something">Text</span>
</a>
...
...

I need to click on this list of links inside a loop and scrape data from result pages. What I have done up till now is:

lists = browser.find_elements_by_xpath("//a[@class='unique']")
for lis in lists:
    print(lis.text)
    lis.click()
    time.sleep(4)
    # Scrape data from this page (works fine).
    browser.back()
    time.sleep(4)

It works fine for the first loop but when the second loop reaches

print(lis.text)

It throws an error saying:

StaleElementReferenceException: Message: stale element reference: element is not attached to the page document

I have tried print (lists) and it gives the list of all the link elements so works fine. The problem occurs when the browser comes back to the previous page. I have tried extending time and using browser.get(...) instead of browser.back() but the error still remains. I don't get why it will not print lis.text because lists still contain a list of all the elements. Any help would be greatly appreciated.

This is expected behavior, when the webpage is reloaded the reference to WebElement becomes stale. In your case, may be you can save list of links(href values) instead of list of `a` elements, and then go to next link directly, instead of returning to original page. — Kamal, Feb 08 '19 at 10:16
So I need to run the loop two times first for saving hrefs and then for going to the link directly. Thanks for the help — A.Hamza, Feb 08 '19 at 10:32

score 2 · Accepted Answer · edited Dec 12 '21 at 16:44

You are trying to click on the text rather than launching the link.

And clicking on the each link, scraping the data and navigating back also not seems effective instead you can store all the links in some list then you can navigate to each link using the driver.get('some link') method and you can scrape the data. So that you can avoid some exceptions, try the below modified code :

# Locate the anchor nodes first and load all the elements into some list
lists = browser.find_elements_by_xpath("//a[@class='unique']")
# Empty list for storing links
links = []
for lis in lists:
    print(lis.get_attribute('href'))
    # Fetch and store the links
    links.append(lis.get_attribute('href'))

# Loop through all the links and launch one by one
for link in links:
    browser.get(link)
    # Scrape here
    sleep(3)

Or if you want to use your same logic then you can use the Fluent Wait to avoid some exceptions such as StaleElementReferenceException like below :

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import *

wait = WebDriverWait(browser, 10, poll_frequency=1, ignored_exceptions=[StaleElementReferenceException])
element = wait.until(EC.element_to_be_clickable((By.XPATH, "xPath that you want to click")))

I hope it helps...

Loop through links and scrape data from resulting pages using Selenium + Python

1 Answers1

Linked