Extract all the src attribute from the images of Amazon Product Page in Selenium/Python

Question

I'm using Selenium to scrape details from an Amazon product page ([Example][1]). I've successfully scraped the product title, but I also want to get the URLs of all of the product images. Here is my code:

from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager

def search_amazon():
    driver = webdriver.Chrome(ChromeDriverManager().install())
    driver.get('https://www.amazon.com/Pendleton-Glacier-National-Queen-Blanket/dp/B003EQ4AYY/?_encoding=UTF8&pd_rd_w=dZURJ&pf_rd_p=ab102187-3a5a-49fd-b43f-4f928775aeae&pf_rd_r=PD8YGV8XA34FMYH7G9TJ&pd_rd_r=2cb55e9c-812a-43de-bf52-7e1976f5374b&pd_rd_wg=KmkoW&ref_=pd_gw_hfp13n_bbn')
    productName = driver.find_element_by_id('productTitle').text
    print(productName)
    imgList = driver.find_element_by_xpath('//*[@id="altImages"]/ul')
    options = imgList.find_elements_by_tag_name("li")

    for option in options:
        print(option.get_attribute("innerHTML"))

search_amazon()

The options loop at the end returns the innerHTML of each LI. I'm unable to access the IMG src though, what I've attempted is:

for option in options:
    src = option.find_element_by_tag_name("img").get_attribute("src")

This throws a NoSuchElementException:

NoSuchElementException: Message: no such element: Unable to locate element: {"method":"css selector","selector":"img"}

score 1 · Answer 1 · answered Jan 10 '21 at 21:41

To get the actual image not the thumbs, i did it using hover function. Adding Waits would be safe to add.

from selenium.webdriver.common.action_chains import ActionChains
...

for i in driver.find_elements_by_css_selector('#altImages .imageThumbnail'):
      hover = ActionChains(driver).move_to_element(i)
      hover.perform()
      driver.find_element_by_css_selector('.image.item.maintain-height.selected img').get_attribute('src'))

This would get the actual fullsize image srcs

score 0 · Answer 2 · answered Jan 01 '21 at 21:13

When you are finding li elements for every image you should specify the class of the element in your path because not every li of the element //*[@id="altImages"]/ul refer to an image. So in order to find the urls you can do like this:

def search_amazon():
    driver = webdriver.Chrome(ChromeDriverManager().install())
    driver.get('https://www.amazon.com/Pendleton-Glacier-National-Queen-Blanket/dp/B003EQ4AYY/?_encoding=UTF8&pd_rd_w=dZURJ&pf_rd_p=ab102187-3a5a-49fd-b43f-4f928775aeae&pf_rd_r=PD8YGV8XA34FMYH7G9TJ&pd_rd_r=2cb55e9c-812a-43de-bf52-7e1976f5374b&pd_rd_wg=KmkoW&ref_=pd_gw_hfp13n_bbn')
    productName = driver.find_element_by_id('productTitle').text
    print(productName)
    imgList = driver.find_element_by_xpath('//*[@id="altImages"]/ul')
    options = imgList.find_elements_by_xpath(".//li[contains(@class, 'imageThumbnail')]")

    for option in options:
        print(option.find_element_by_tag_name("img").get_attribute("src")

score 0 · Accepted Answer · answered Jan 01 '21 at 21:42

To print the value of the src attributes of the <img> tags you have to induce WebDriverWait for the visibility_of_all_elements_located() and you can use either of the following Locator Strategies:

Using CSS_SELECTOR:

driver.get('https://www.amazon.com/Pendleton-Glacier-National-Queen-Blanket/dp/B003EQ4AYY/?_encoding=UTF8&pd_rd_w=dZURJ&pf_rd_p=ab102187-3a5a-49fd-b43f-4f928775aeae&pf_rd_r=PD8YGV8XA34FMYH7G9TJ&pd_rd_r=2cb55e9c-812a-43de-bf52-7e1976f5374b&pd_rd_wg=KmkoW&ref_=pd_gw_hfp13n_bbn')
print([my_elem.get_attribute("src") for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "div#altImages>ul li[data-ux-click] img")))])

Using XPATH:

driver.get('https://www.amazon.com/Pendleton-Glacier-National-Queen-Blanket/dp/B003EQ4AYY/?_encoding=UTF8&pd_rd_w=dZURJ&pf_rd_p=ab102187-3a5a-49fd-b43f-4f928775aeae&pf_rd_r=PD8YGV8XA34FMYH7G9TJ&pd_rd_r=2cb55e9c-812a-43de-bf52-7e1976f5374b&pd_rd_wg=KmkoW&ref_=pd_gw_hfp13n_bbn')
print([my_elem.get_attribute("src") for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//div[@id='altImages']/ul//li[@data-ux-click]//img")))])

Console Output:

['https://images-na.ssl-images-amazon.com/images/I/41Sj%2BO--J9L._AC_US40_.jpg', 'https://images-na.ssl-images-amazon.com/images/I/41iX14X%2BoRL._AC_US40_.jpg', 'https://images-na.ssl-images-amazon.com/images/I/41wiU-3N5JL._AC_US40_.jpg', 'https://images-na.ssl-images-amazon.com/images/I/41waNtDjTxL._AC_US40_.jpg']

Note : You have to add the following imports :

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

References

You can find a couple of relevant detailed discussions in:

score 0 · Answer 4 · answered May 18 '23 at 18:32

Scrape complete image from amazon product

i=1
while i<15:
    try:
        btn=driver.find_element(By.XPATH,'*//ul/li['+str(i)+']/span/span/span/input').click()
        time.sleep(3)
        main=driver.find_element(By.CSS_SELECTOR,'.image.item.maintain-height.selected img').get_attribute('src')
        image_url.append(main)
        print(main)
    except:
        pass
    i=i+1

Extract all the src attribute from the images of Amazon Product Page in Selenium/Python

4 Answers4

References