-1

I am looking for advice on how to approach this problem. Here's the deal. I work for Givenchy and I want to scrape all the images from https://www.givenchy.com/us/en-US/women/bags/all-bags/?start=0&sz=21 in order to compile them for a photo share. The images I want are those that initially appear, that is, those that appear on the website before you put your mouse over the image. The distinction is important because when you put your mouse over the image it turns into an image of a model wearing the bag; I want the image only of the bag itself. When I inspect the page with the Chrome inspect tool I can only see the link for the image with the model.

Is there a way to do what I want and if so how?

3 Answers3

1

selenium isn't needed. The picture is inside the tag <picture> <source ...>, so with correct CSS selector and string manipulation you can get the picture urls.

For example:

import requests
from bs4 import BeautifulSoup


url = 'https://www.givenchy.com/us/en-US/women/bags/all-bags/?start=0&sz=21'
soup = BeautifulSoup(requests.get(url).content, 'html.parser')

for p in soup.select('picture.thumb-img source[media="(min-width: 1800px)"][srcset*="/images/"]'):
    p = p['srcset'].split(',')[-1].split()[0]
    print(p)

Prints:

https://www.givenchy.com/dw/image/v2/BBRT_PRD/on/demandware.static/-/Sites-Givenchy_master/default/dwe86ac579/images/BB500CB0WY001/BB500CB0WY001-01-01.jpg?sw=800
https://www.givenchy.com/dw/image/v2/BBRT_PRD/on/demandware.static/-/Sites-Givenchy_master/default/dw8c8efbee/images/BB50F2B0WY001/BB50F2B0WY001-01-01.jpg?sw=800
https://www.givenchy.com/dw/image/v2/BBRT_PRD/on/demandware.static/-/Sites-Givenchy_master/default/dw72d49df0/images/BB50F2B0WD001/BB50F2B0WD001-01-01.jpg?sw=800
https://www.givenchy.com/dw/image/v2/BBRT_PRD/on/demandware.static/-/Sites-Givenchy_master/default/dw16bf6873/images/BB50F0B0WD001/BB50F0B0WD001-01-01.jpg?sw=800
https://www.givenchy.com/dw/image/v2/BBRT_PRD/on/demandware.static/-/Sites-Givenchy_master/default/dwa89db782/images/BB50F0B0WD309/BB50F0B0WD309-01-01.jpg?sw=800
https://www.givenchy.com/dw/image/v2/BBRT_PRD/on/demandware.static/-/Sites-Givenchy_master/default/dwb8bb418a/images/BB50F0B0WD051/BB50F0B0WD051-01-01.jpg?sw=800
https://www.givenchy.com/dw/image/v2/BBRT_PRD/on/demandware.static/-/Sites-Givenchy_master/default/dweacfc390/images/BB50F2B0WD292/BB50F2B0WD292-01-01.jpg?sw=800
https://www.givenchy.com/dw/image/v2/BBRT_PRD/on/demandware.static/-/Sites-Givenchy_master/default/dw51675237/images/BB50F2B0WD051/BB50F2B0WD051-01-01.jpg?sw=800
https://www.givenchy.com/dw/image/v2/BBRT_PRD/on/demandware.static/-/Sites-Givenchy_master/default/dw47ef9b42/images/BB50F3B0WD001/BB50F3B0WD001-01-01.jpg?sw=800
https://www.givenchy.com/dw/image/v2/BBRT_PRD/on/demandware.static/-/Sites-Givenchy_master/default/dw32b9df63/images/BB50F3B0WD051/BB50F3B0WD051-01-01.jpg?sw=800
https://www.givenchy.com/dw/image/v2/BBRT_PRD/on/demandware.static/-/Sites-Givenchy_master/default/dw102294c8/images/BB50F3B0WD496/BB50F3B0WD496-01-01.jpg?sw=800
https://www.givenchy.com/dw/image/v2/BBRT_PRD/on/demandware.static/-/Sites-Givenchy_master/default/dw09d01050/images/BB50F3B0WD662/BB50F3B0WD662-01-01.jpg?sw=800
https://www.givenchy.com/dw/image/v2/BBRT_PRD/on/demandware.static/-/Sites-Givenchy_master/default/dw442b46a4/images/BB50F2B0WD542/BB50F2B0WD542-01-01.jpg?sw=800
https://www.givenchy.com/dw/image/v2/BBRT_PRD/on/demandware.static/-/Sites-Givenchy_master/default/dw1e454ef3/images/BB50F2B0WD309/BB50F2B0WD309-01-01.jpg?sw=800
https://www.givenchy.com/dw/image/v2/BBRT_PRD/on/demandware.static/-/Sites-Givenchy_master/default/dw3aa399b9/images/BB05117012542/BB05117012542-01-01.jpg?sw=800
https://www.givenchy.com/dw/image/v2/BBRT_PRD/on/demandware.static/-/Sites-Givenchy_master/default/dw9eb8ec2d/images/BB05114012542/BB05114012542-01-01.jpg?sw=800
https://www.givenchy.com/dw/image/v2/BBRT_PRD/on/demandware.static/-/Sites-Givenchy_master/default/dw7e12db48/images/BBU017B00B001/BBU017B00B001-01-01.jpg?sw=800
https://www.givenchy.com/dw/image/v2/BBRT_PRD/on/demandware.static/-/Sites-Givenchy_master/default/dw924ff9f6/images/BBU017B00B058/BBU017B00B058-01-01.jpg?sw=800
https://www.givenchy.com/dw/image/v2/BBRT_PRD/on/demandware.static/-/Sites-Givenchy_master/default/dw1974540d/images/BBU017B00B662/BBU017B00B662-01-01.jpg?sw=800
https://www.givenchy.com/dw/image/v2/BBRT_PRD/on/demandware.static/-/Sites-Givenchy_master/default/dw28c6592d/images/BBU017B00B140/BBU017B00B140-01-01.jpg?sw=800

EDIT: to get more quality images, change the ?sw= parameter to higher resolution.

For example:

url = 'https://www.givenchy.com/us/en-US/women/bags/all-bags/?start=0&sz=21'
soup = BeautifulSoup(requests.get(url).content, 'html.parser')

for p in soup.select('picture.thumb-img source[media="(min-width: 1800px)"][srcset*="/images/"]'):
    p = p['srcset'].split(',')[-1].split()[0].replace('?sw=800', '?sw=1920')
    print(p)

EDIT: To get bag names along the URLs you can use:

url = 'https://www.givenchy.com/us/en-US/women/bags/all-bags/?start=0&sz=21'
soup = BeautifulSoup(requests.get(url).content, 'html.parser')

for p in soup.select('picture.thumb-img source[media="(min-width: 1800px)"][srcset*="/images/"]'):
    pic_url = p['srcset'].split(',')[-1].split()[0].replace('?sw=800', '?sw=1920')
    name = p.find_next(class_='product-name').get_text(strip=True)
    print(name, pic_url)
Andrej Kesely
  • 168,389
  • 15
  • 48
  • 91
  • 1
    Okay so now my goal has changed. The images in your list are too low quality so I would like to be able to "click" each image and then on the page the image links to, download the higher quality images. How can I do that? – Fernando Varela Jun 29 '20 at 17:09
  • 1
    @FernandoVarela Change the `?sw=800` parameter in URLs to something like `?sw=1920` or more..., see my edit. – Andrej Kesely Jun 29 '20 at 17:10
  • Extra question. Is there a way for me to match the image url with the name of the bag as it is displayed on the website? Basically when I download these images I would like to have the name of the bag as the name of the file. – Fernando Varela Jul 06 '20 at 03:00
0

You are probably inspecting the element after hovering over the image which is why it giving you the image of model. The link is update on hover from (Original bag image) givenchy.com/dw/image/v2/BBRT_PRD/on/demandware.static/-/Sites-Givenchy_master/default/dwe86ac579/images/BB500CB0WY001/BB500CB0WY001-01-01.jpg?sw=466

To model's image:

givenchy.com/dw/image/v2/BBRT_PRD/on/demandware.static/-/Sites-Givenchy_master/default/dwd050ac75/images/BB500CB0WY001/BB500CB0WY001-01-02.jpg?sw=466

See the difference in bold text. Try drilling down to below Xpath without hovering on the bag image: /html/body/div[1]/main/div[5]/div[2]/div[3]/div/div/ul/li[1]/div/figure/a[1]/picture[1]/source[3]
As Andrej pointed above, you can use BeautifulSoup to achieve this.

SKS
  • 46
  • 5
0

To print the value of the srcset attribute of the images before mousehover the image, you have to induce WebDriverWait for the visibility_of_element_located() and you can use either of the following Locator Strategies:

  • Using CSS_SELECTOR:

    driver.get('https://www.givenchy.com/us/en-US/women/bags/all-bags/?start=0&sz=21')
    print([my_elem.get_attribute("srcset") for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "ul.search-result-items.tiles-container.js-slv-product-grid.row figure.product-image picture.thumb-img img")))])
    
  • Using XPATH:

    driver.get('https://www.givenchy.com/us/en-US/women/bags/all-bags/?start=0&sz=21')
    print([my_elem.get_attribute("srcset") for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//ul[@class='search-result-items tiles-container js-slv-product-grid row']//figure[contains(@class, 'product-image ')]//picture[@class='thumb-img']//img")))])
    
  • Console Output:

    ['https://www.givenchy.com/dw/image/v2/BBRT_PRD/on/demandware.static/-/Sites-Givenchy_master/default/dwe86ac579/images/BB500CB0WY001/BB500CB0WY001-01-01.jpg?sw=466', 'https://www.givenchy.com/dw/image/v2/BBRT_PRD/on/demandware.static/-/Sites-Givenchy_master/default/dw8c8efbee/images/BB50F2B0WY001/BB50F2B0WY001-01-01.jpg?sw=466', 'https://www.givenchy.com/dw/image/v2/BBRT_PRD/on/demandware.static/-/Sites-Givenchy_master/default/dw2264f584/LOOKS%20FWxS20/ECOM2.jpg?sw=1000', 'https://www.givenchy.com/dw/image/v2/BBRT_PRD/on/demandware.static/-/Sites-Givenchy_master/default/dw72d49df0/images/BB50F2B0WD001/BB50F2B0WD001-01-01.jpg?sw=466', 'https://www.givenchy.com/dw/image/v2/BBRT_PRD/on/demandware.static/-/Sites-Givenchy_master/default/dw16bf6873/images/BB50F0B0WD001/BB50F0B0WD001-01-01.jpg?sw=466', 'https://www.givenchy.com/dw/image/v2/BBRT_PRD/on/demandware.static/-/Sites-Givenchy_master/default/dwa89db782/images/BB50F0B0WD309/BB50F0B0WD309-01-01.jpg?sw=466', 'https://www.givenchy.com/dw/image/v2/BBRT_PRD/on/demandware.static/-/Sites-Givenchy_master/default/dwb8bb418a/images/BB50F0B0WD051/BB50F0B0WD051-01-01.jpg?sw=466', 'https://www.givenchy.com/dw/image/v2/BBRT_PRD/on/demandware.static/-/Sites-Givenchy_master/default/dweacfc390/images/BB50F2B0WD292/BB50F2B0WD292-01-01.jpg?sw=466', 'https://www.givenchy.com/dw/image/v2/BBRT_PRD/on/demandware.static/-/Sites-Givenchy_master/default/dw51675237/images/BB50F2B0WD051/BB50F2B0WD051-01-01.jpg?sw=466', 'https://www.givenchy.com/dw/image/v2/BBRT_PRD/on/demandware.static/-/Sites-Givenchy_master/default/dw47ef9b42/images/BB50F3B0WD001/BB50F3B0WD001-01-01.jpg?sw=466', 'https://www.givenchy.com/dw/image/v2/BBRT_PRD/on/demandware.static/-/Sites-Givenchy_master/default/dw32b9df63/images/BB50F3B0WD051/BB50F3B0WD051-01-01.jpg?sw=466', 'https://www.givenchy.com/dw/image/v2/BBRT_PRD/on/demandware.static/-/Sites-Givenchy_master/default/dw102294c8/images/BB50F3B0WD496/BB50F3B0WD496-01-01.jpg?sw=466', 'https://www.givenchy.com/dw/image/v2/BBRT_PRD/on/demandware.static/-/Sites-Givenchy_master/default/dw09d01050/images/BB50F3B0WD662/BB50F3B0WD662-01-01.jpg?sw=466', 'https://www.givenchy.com/dw/image/v2/BBRT_PRD/on/demandware.static/-/Sites-Givenchy_master/default/dw442b46a4/images/BB50F2B0WD542/BB50F2B0WD542-01-01.jpg?sw=466', 'https://www.givenchy.com/dw/image/v2/BBRT_PRD/on/demandware.static/-/Sites-Givenchy_master/default/dw1e454ef3/images/BB50F2B0WD309/BB50F2B0WD309-01-01.jpg?sw=466', 'https://www.givenchy.com/dw/image/v2/BBRT_PRD/on/demandware.static/-/Sites-Givenchy_master/default/dw3aa399b9/images/BB05117012542/BB05117012542-01-01.jpg?sw=466', 'https://www.givenchy.com/dw/image/v2/BBRT_PRD/on/demandware.static/-/Sites-Givenchy_master/default/dw9eb8ec2d/images/BB05114012542/BB05114012542-01-01.jpg?sw=466', 'https://www.givenchy.com/dw/image/v2/BBRT_PRD/on/demandware.static/-/Sites-Givenchy_master/default/dw7e12db48/images/BBU017B00B001/BBU017B00B001-01-01.jpg?sw=466', 'https://www.givenchy.com/dw/image/v2/BBRT_PRD/on/demandware.static/-/Sites-Givenchy_master/default/dw924ff9f6/images/BBU017B00B058/BBU017B00B058-01-01.jpg?sw=466', 'https://www.givenchy.com/dw/image/v2/BBRT_PRD/on/demandware.static/-/Sites-Givenchy_master/default/dw1974540d/images/BBU017B00B662/BBU017B00B662-01-01.jpg?sw=466', 'https://www.givenchy.com/dw/image/v2/BBRT_PRD/on/demandware.static/-/Sites-Givenchy_master/default/dw28c6592d/images/BBU017B00B140/BBU017B00B140-01-01.jpg?sw=466']
    
  • Note : You have to add the following imports :

    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    
undetected Selenium
  • 183,867
  • 41
  • 278
  • 352