0

We are working on extracting the image source address from the page.

<div class="product-row">
  <div class="product-item">
  <div class="product-picture"><img src="https://t3a.coupangcdn.com/thumbnails/remote/212x212ex/image/vendor_inventory/6ca9/2e097d911efc291473d0c47052cdc8f42d7b7b8f2a3ebbb0ccc974d76fe4.jpg" alt="product"><div><button type="button" class="ant-btn hover-btn btn-open-detail">
  </div></div>
  <div class="product-item">
  <div class="product-picture">
  <img src="https://thumbnail11.coupangcdn.com/thumbnails/remote/212x212ex/image/retail/images/239519218793467-6edc7d92-4165-4476-a528-fa238ffeeeb6.jpg" alt="product"><div></div></div>

I tried to get it in the following way:

ele = driver.find_elements_by_xpath("//div[@class='product-picture']/img")
print(ele)

Output:

<selenium.webdriver.remote.webelement.WebElement (session="d9fd08b93bd5dd83fe520826c1f6fd77", element="27ef8c33-624d-4166-9dc7-3a355c4dcc32")>
<selenium.webdriver.remote.webelement.WebElement (session="d9fd08b93bd5dd83fe520826c1f6fd77", element="a6d77107-fecf-4c84-a048-9b4bda39b9df")>
<selenium.webdriver.remote.webelement.WebElement (session="d9fd08b93bd5dd83fe520826c1f6fd77", element="1f62cb8b-df58-4f06-afe6-6c60cb572527")>

I want the image source address string of every <div class="product-picture"> element on the page. Is there a way to extract a string?

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
anfwkdrn
  • 327
  • 1
  • 7

3 Answers3

1

Try to use get_attribute('src') method to grab the src value

ele = driver.find_elements_by_xpath("//div[@class='product-picture']/img").get_attribute('src')
Md. Fazlul Hoque
  • 15,806
  • 5
  • 12
  • 32
1
from selenium.webdriver.common.by import By

images = driver.find_elements(By.XPATH, "//div[@class='product-picture']/img")
for img in images:
    print(img.get_attribute("src"))

This will give you the expected output:

https://t3a.coupangcdn.com/thumbnails/remote/212x212ex/image/vendor_inventory/6ca9/2e097d911efc291473d0c47052cdc8f42d7b7b8f2a3ebbb0ccc974d76fe4.jpg"
https://thumbnail11.coupangcdn.com/thumbnails/remote/212x212ex/image/retail/images/239519218793467-6edc7d92-4165-4476-a528-fa238ffeeeb6.jpg
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Himanshu Poddar
  • 7,112
  • 10
  • 47
  • 93
1

You are using deprecated syntax. Please see Python Selenium warning "DeprecationWarning: find_element_by_* commands are deprecated"

The optimal way of locating elements which are likely to be lazy loading would be:

images = WebDriverWait(browser, 10).until(EC.presence_of_all_elements_located((By.XPATH, "//div[@class='product-picture']/img")))
for i in images:
    print(i.get_attribute('src')

You will also need the following imports:

from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

Selenium docs can be found at https://www.selenium.dev/documentation/

Barry the Platipus
  • 9,594
  • 2
  • 6
  • 30