0

My main purpose is to go to this specific website, to click each of the products, have enough time to scrape the data from the clicked product, then go back to click another product from the page until all the products are clicked through and scraped (The scraping code I have not included).

My code opens up chrome to redirect to my desired website, generates a list of links to click by class_name. This is the part I am stuck on, I would believe I need a for-loop to iterate through the list of links to click and go back to the original. But, I can't figure out why this won't work.

Here is my code:

import csv
import time
from selenium import webdriver
import selenium.webdriver.chrome.service as service
import requests
from bs4 import BeautifulSoup


url = "https://www.vatainc.com/infusion/adult-infusion.html?limit=all"
service = service.Service('path to chromedriver')
service.start()
capabilities = {'chrome.binary': 'path to chrome'}
driver = webdriver.Remote(service.service_url, capabilities)
driver.get(url)
time.sleep(2)
links = driver.find_elements_by_class_name('product-name')


for link in links:
    link.click()
    driver.back()
    link.click()
Jonathan
  • 31
  • 8
  • Haven't tested your code, but there should be a mistake in the for loop, the second link.click() should not be there I guess... – kaihami Nov 20 '18 at 17:30
  • Yes, I had that before. It gives the error 'Message: stale element reference: element is not attached to the page document'. I'm not sure why though, I have multiple elements in the links variable.. – Jonathan Nov 20 '18 at 17:33
  • Regarding your "stale element" error: https://stackoverflow.com/questions/44630912/how-to-use-selenium-to-click-through-multiple-elements-while-avoiding-stale-elem After the first link clicked, you don't have access to the previous page, you need to re-evaluate (find) the links to click. – Clément Denoix Nov 20 '18 at 17:39
  • @ClémentDenoix That is a good point, I am curious to how I would find the links again. I tried for link in links: link.click() driver.back() time.sleep(2) driver.find_elements_by_class_name('product-name') ...................................... But still the same error stale reference error.. – Jonathan Nov 20 '18 at 17:56

1 Answers1

2

I have another solution to your problem.

When I tested your code it showed a strange behaviour. Fixed all problems that I had using xpath.

url = "https://www.vatainc.com/infusion/adult-infusion.html?limit=all"
driver.get(url)
links = [x.get_attribute('href') for x in driver.find_elements_by_xpath("//*[contains(@class, 'product-name')]/a")]
htmls = []
for link in links:
    driver.get(link)
    htmls.append(driver.page_source)

Instead of going back and forward I saved all links (named as links) and iterate over this list.

kaihami
  • 815
  • 7
  • 18