3

So this question has been asked before but I am still struggling to get it working.

The webpage has a table with links, I want to iterate through clicking each of the links.

enter image description here

So this is my code so far

from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

driver = webdriver.Chrome(executable_path=r'C:\Users\my_path\chromedriver_96.exe')
driver.get(r"https://www.fidelity.co.uk/shares/ftse-350/")

try:
    element = WebDriverWait(driver, 20).until(
        EC.presence_of_element_located((By.CLASS_NAME, "table-scroll")))

    table = element.find_elements_by_xpath("//table//tbody/tr")
 
    for row in table[1:]:
        print(row.get_attribute('innerHTML'))
        # link.click()

finally:
    driver.close()

Sample of output

            <td>FOUR</td>
            <td><a href="/factsheets/4IMPRINT-GROUP/GB0006640972-GBP/?id=GB0006640972GBP&amp;idType=isin&amp;marketCode=&amp;idCurrencyid=" target="_parent">4imprint Group plc</a></td>
            <td>Media &amp; Publishing</td>
        

            <td>888</td>
            <td><a href="/factsheets/888-HOLDINGS/GI000A0F6407-GBP/?id=GI000A0F6407GBP&amp;idType=isin&amp;marketCode=&amp;idCurrencyid=" target="_parent">888 Holdings</a></td>
            <td>Hotels &amp; Entertainment Services</td>
        

            <td>ASL</td>
            <td><a href="/factsheets/ABERFORTH-SMALLER-COMPANIES-TRUST/GB0000066554-GBP/?id=GB0000066554GBP&amp;idType=isin&amp;marketCode=&amp;idCurrencyid=" target="_parent">Aberforth Smaller Companies Trust</a></td>
            <td>Collective Investments</td>


How do a click the href and iterate to the next href?

Many thanks.

edit I went with this solution (a few small tweaks on Prophet's solution)

from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
import time
from selenium.webdriver.common.action_chains import ActionChains


driver = webdriver.Chrome(executable_path=r'C:\Users\my_path\chromedriver_96.exe')
driver.get(r"https://www.fidelity.co.uk/shares/ftse-350/")
actions = ActionChains(driver)
#close the cookies banner
WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.ID, "ensCloseBanner"))).click()
#wait for the first link in the table
WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//table//tbody/tr/td/a")))
#extra wait to make all the links loaded
time.sleep(1)
#get the total links amount
links = driver.find_elements_by_xpath('//table//tbody/tr/td/a') 

for index, val in enumerate(links):
    try:
        #get the links again after getting back to the initial page in the loop
        links = driver.find_elements_by_xpath('//table//tbody/tr/td/a')
        #scroll to the n-th link, it may be out of the initially visible area
        actions.move_to_element(links[index]).perform()
        links[index].click()
        #scrape the data on the new page and get back with the following command
        driver.execute_script("window.history.go(-1)") #you can alternatevely use this as well: driver.back()
        WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//table//tbody/tr/td/a")))
        time.sleep(2)
    except StaleElementReferenceException:  
        pass
Prophet
  • 32,350
  • 22
  • 54
  • 79
Cam
  • 1,263
  • 13
  • 22
  • for example you wanna first click on `4imprint Group plc` and in the next iteration click on `888 Holdings` and so on.. ? if yes, then what exactly you wanna do once click on the links ? do you want to scrape something ? – cruisepandey Jan 12 '22 at 14:53
  • Yes I, need to get through to the next page and scrape data. Same thing for 350 pages. – Cam Jan 12 '22 at 14:56
  • What I am asking is, once you click on `4imprint Group plc` what is the expectation? You just do not want to click on the links, right ? – cruisepandey Jan 12 '22 at 14:58
  • No I want to go to the next page and scrape some data. Then return and click the next link and go to that page and scrape. Repeat. – Cam Jan 12 '22 at 15:01
  • okay so are you okay with just clicking on the link and then going back to the main page and then clicking the next link, until you click on all the link ? if you are okay with that approach I can work on this ticket. – cruisepandey Jan 12 '22 at 15:04
  • That was the approach I was working on. Is there a better way I should have considered? – Cam Jan 12 '22 at 15:36

2 Answers2

0

To perform what you want to do here you first need to close cookies banner on the bottom of the page.
Then you can iterate over the links in the table.
Since by clicking on each link you are opening a new page, after scaring the data there you will have to get back to the main page and get the next link. You can not just get all the links into some list and then iterate over that list since by navigating to another web page all the existing elements grabbed by Selenium on the initial page become Stale.
Your code can be something like this:

from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
import time


driver = webdriver.Chrome(executable_path=r'C:\Users\my_path\chromedriver_96.exe')
driver.get(r"https://www.fidelity.co.uk/shares/ftse-350/")
actions = ActionChains(driver)
#close the cookies banner
WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.ID, "ensCloseBanner"))).click()
#wait for the first link in the table
WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//table//tbody/tr/td/a")))
#extra wait to make all the links loaded
time.sleep(1)
#get the total links amount
links = driver.find_elements_by_xpath('//table//tbody/tr/td/a') 
for index, val in enumerate(links):
    #get the links again after getting back to the initial page in the loop
    links = driver.find_elements_by_xpath('//table//tbody/tr/td/a')
    #scroll to the n-th link, it may be out of the initially visible area
    actions.move_to_element(links[index]).perform()
    links[index].click()
    #scrape the data on the new page and get back with the following command
    driver.execute_script("window.history.go(-1)") #you can alternatevely use this as well: driver.back()
    WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//table//tbody/tr/td/a")))
    time.sleep(1)
Prophet
  • 32,350
  • 22
  • 54
  • 79
  • thanks for this. There is a problem on hit on the emumerate line. error ```TypeError: object of type 'WebElement' has no len()``` – Cam Jan 12 '22 at 15:44
  • 1
    Sure, my bad. It should be `find_elements_by_xpath` there, not `find_element_by_xpath` – Prophet Jan 12 '22 at 15:45
  • So links is returning a list of ```selenium.webdriver.remote.webelement.WebElement``` objects but then I get this error ```TypeError: 'int' object is not iterable``` with the enumerate line. – Cam Jan 12 '22 at 15:53
  • I'm sorry, again my typo.. I wrote that too quickly... Fixed that – Prophet Jan 12 '22 at 16:06
  • please let me know if now it works correctly – Prophet Jan 12 '22 at 16:36
  • Still fighting with it. ```actions.move_to_element(link[index]).perform()``` does not work. error ```TypeError: 'int' object is not iterable``` I can get it to work by just using val ```actions.move_to_element(val).perform()``` but then the link refresh is not working and I get ```stale element reference``` error. – Cam Jan 12 '22 at 16:40
  • 1
    Sure, it should be `links` there... – Prophet Jan 12 '22 at 16:44
  • 1
    If there are still any problems - don't hesitate, just let me know. – Prophet Jan 12 '22 at 16:46
  • haha, my bad this time I missed the 's' off the second ```find_elements_by_xpath```. So yes that works but like I said I get on to the ```StaleElementReferenceException: Message: stale element reference: element is not attached to the page document``` error. I have tried adding an ```implicitly_wait``` but no luck. – Cam Jan 12 '22 at 16:50
  • 1
    `implicitly_wait` will never help with `StaleElementReferenceException`. Where, for what code line, do you see the `StaleElementReferenceException`? – Prophet Jan 12 '22 at 16:53
  • ```actions.move_to_element(links[index]).perform()``` I get two full iterations out of it before I get the error. – Cam Jan 12 '22 at 16:57
  • Right I think I have a working solution using a try except block with ```StaleElementReferenceException``` set to pass. – Cam Jan 12 '22 at 17:06
  • 1
    I don't think `pass` in `except` part of `try` is a good solution here... What is wrong with my code now? – Prophet Jan 12 '22 at 17:08
  • I added it to solve the ```StaleElementReferenceException``` error. – Cam Jan 12 '22 at 17:09
  • 1
    But `pass` doesn't actually resolves the problem, it just continues the flow. You should never see `StaleElementReferenceException` and in case it comes - we should fix our Selenium code – Prophet Jan 12 '22 at 17:11
0

You basically have to do the following:

  1. Click on the cookies button if available
  2. Get all the links on the page.
  3. Iterate over the list of links and then click on the first (by first scrolling to the web element and doing that for the list item) and then navigate back to the original screen.

Code:

driver = webdriver.Chrome(driver_path)
driver.maximize_window()
wait = WebDriverWait(driver, 30)

driver.get("https://www.fidelity.co.uk/shares/ftse-350/")

try:
    wait.until(EC.element_to_be_clickable((By.ID, "ensCloseBanner"))).click()
    print('Click on the cookies button')
except:
    print('Could not click on the cookies button')
    pass

driver.execute_script("window.scrollTo(0, 750)")

try:
    all_links = wait.until(EC.presence_of_all_elements_located((By.XPATH, "//table//tbody/tr/td/a")))
    print("We have got to deal with", len(all_links), 'links')
    j = 0
    for link in range(len(all_links)):
        links = wait.until(EC.presence_of_all_elements_located((By.XPATH, f"//table//tbody/tr/td/a")))
        driver.execute_script("arguments[0].scrollIntoView(true);", links[j])
        time.sleep(1)
        links[j].click()
        # here write the code to scrape something once the click is performed
        time.sleep(1)
        driver.execute_script("window.history.go(-1)")
        j = j + 1
        print(j)
except:
    print('Bot Could not exceute all the links properly')
    pass

Import:

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

PS to handle stale element reference you'd have to define the list of web elements again inside the loop.

cruisepandey
  • 28,520
  • 6
  • 20
  • 38