0

I am trying to append car links from a website into a list. I want to traverse that list to get information from each of the car's web pages.

So far I have tried both .append method as well as += operator method but I get the same errors for both which is :

AttributeError: 'str' object has no attribute 'get_attribute'

This only shows up when I use the following line of code:

carLinks += [carLink.get_attribute("href")]

or the append method. However, if I just print the carLink.get_attribute("href") then it prints all the links.

This is the partial code I used:

from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager

driver = webdriver.Chrome(ChromeDriverManager().install())
driver.get("https://www.tred.com/buy?body_style=&distance=50&exterior_color_id=&make=&miles_max=100000&miles_min=0&model=&page_size=24&price_max=100000&price_min=0&query=&requestingPage=buy&sort=desc&sort_field=updated&status=active&year_end=2022&year_start=1998&zip=")


carLinks = []

carLinks = driver.find_elements_by_css_selector("div.grid-box-container a")
for carLink in carLinks:
    carLinkUrl = carLink.get_attribute("href")
    carLinks.append(carLinkUrl)
    # print(carLinkUrl)

print(carLinks)

driver.quit()

I haven't tried it in BeautifulSoup yet as I am not used to mixing both Selenium and BeautifulSoup at once.

vitaliis
  • 4,082
  • 5
  • 18
  • 40

4 Answers4

1

You have to add a wait / delay to let the page elements loaded before accessing them.
Without that getting driver.find_elements_by_css_selector("div.grid-box-container a") immediately after driver.get("https://www.tred.com/buy?body_style=&distance=50&exterior_color_id=&make=&miles_max=100000&miles_min=0&model=&page_size=24&price_max=100000&price_min=0&query=&requestingPage=buy&sort=desc&sort_field=updated&status=active&year_end=2022&year_start=1998&zip=") returns an empty list passed into the carLinks.
This should fork better:

rom selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

driver = webdriver.Chrome(ChromeDriverManager().install())
wait = WebDriverWait(driver, 20)

driver.get("https://www.tred.com/buy?body_style=&distance=50&exterior_color_id=&make=&miles_max=100000&miles_min=0&model=&page_size=24&price_max=100000&price_min=0&query=&requestingPage=buy&sort=desc&sort_field=updated&status=active&year_end=2022&year_start=1998&zip=")


carLinks = []
wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, "div.grid-box-container a")))

carLinks = driver.find_elements_by_css_selector("div.grid-box-container a")
for carLink in carLinks:
    carLinkUrl = carLink.get_attribute("href")
    carLinks.append(carLinkUrl)
    # print(carLinkUrl)

print(carLinks)

driver.quit()
Prophet
  • 32,350
  • 22
  • 54
  • 79
0

This is cause you have a list of and name is carLinks. Also in your loop :

for carLink in carLinks:
    carLinkUrl = carLink.get_attribute("href")
    carLinks.append(carLinkUrl)

You have same name of a web element.

Compiler will think carLinks is an web element because of local scope.

and since carLinks is locally a web element, there is no append method available in Selenium.

Please change either one of names.

carLinks = []

links = driver.find_elements_by_css_selector("div.grid-box-container a")
for car_link in links:
    carLinks.append(car_link.get_attribute('href'))

print(carLinks)
cruisepandey
  • 28,520
  • 6
  • 20
  • 38
0

I noticed your list 'carLinks' shares the same name as the driver.find. so first that name refers to a list, to which you can append. But before you do you change the variable to a web element (which I suppose is a string then) using selenium.

Could this be the issue? I'd suggest renaming that list.

Quick side note, check if the website allows webscraping. I recall a site called autoscout having some legal issues doing something similar.

0

So I found this link where the guy used a range() for loop rather than iterating the list of webpage links (objects). There's probably some local scope issue like cruisepandey said or maybe the delay is too short like Pandey said. It works fine now.

I changed the code to this:

from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager

driver = webdriver.Chrome(ChromeDriverManager().install())
driver.get("https://www.tred.com/buy?body_style=&distance=50&exterior_color_id=&make=&miles_max=100000&miles_min=0&model=&page_size=24&price_max=100000&price_min=0&query=&requestingPage=buy&sort=desc&sort_field=updated&status=active&year_end=2022&year_start=1998&zip=")

carLinks = []

carLinks = driver.find_elements_by_css_selector("div.grid-box-container a")
for i in range(len(carLinks)):
    carLinks.append(carLinks[i].get_attribute('href'))
    
print(carLinks)

driver.quit()

Even removed the carLink variable to make it shorter.

Tomerikoo
  • 18,379
  • 16
  • 47
  • 61
  • 1
    As it’s currently written, your answer is unclear. Please [edit] to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Oct 11 '21 at 15:57