0

i have a problem with extracting from a dynamic car website:

def scrape(url):

    browser = webdriver.Chrome('./chromedriver') 
    browser.get(url) 
    soup = BeautifulSoup(browser.page_source, "lxml")
    
    cars = soup.find_all('article', class_='serp-item list')
    for car in cars:
        car_url = car.find('a')['href']
        img = car.img['src']
        print("IMG: " + img)
        print("URL: " + car_url)

for test reasons, first i printed the url which i want to download from but the img url goes like this IMG: /img/empty.png but the car_url is valid Console Log

Tried scrolling whole page first but nothing achieved:

    # elem = driver.find_element(By.ID, 'serp-wrapper')
    # elem.click()
    # elem = browser.find_element(By.ID, 'serp-wrapper')
    # no_of_pagedowns = 10
    # while no_of_pagedowns:
    #     elem.send_keys(Keys.PAGE_DOWN)
    #     time.sleep(0.2)
    #     no_of_pagedowns-=1
  • `page_source` seems to be the initial HTML without JavaScript modifications to the DOM. Try https://stackoverflow.com/a/61835835/987358 instead. – Michael Butscher Jun 02 '23 at 12:44

0 Answers0