3

I am trying to download 10 try-on images from https://crosset.onward.co.jp/coordinate/2828615 by using python selenium, but I got only 3 images. The codes and relevant output are below.

This website has <div class="swiper-slide">, which might be relevant to this issue, but I have no idea how to fix this problem.

site = "https://crosset.onward.co.jp/coordinate/2828615"
driver = webdriver.Chrome(ChromeDriverManager().install(), options=options)
driver.get(site)

#driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
#sleep(0.5)

name = driver.find_element_by_class_name("c-breadcrumb__head-text").get_attribute("innerText").strip()
images = driver.find_element_by_css_selector(".p-styling-image.p-styling-image__content")
images = images.find_elements_by_css_selector(".p-link-card__image-body--contain.lazyloaded")
print("number of images : ", len(images))
print(images)
for image in images:
    src = image.get_attribute("src")
    driver.get(src)
    img = driver.find_element_by_tag_name('img').screenshot_as_png
number of images : 3 
[<selenium.webdriver.remote.webelement.WebElement (session="c86015b781750f58884130fac60ea02f", element="1dae6ac3-25f7-4d35-a97e-c3b857e70997")>, <selenium.webdriver.remote.webelement.WebElement (session="c86015b781750f58884130fac60ea02f", element="34b598fa-f1aa-4d71-8928-330857f3a11c")>, <selenium.webdriver.remote.webelement.WebElement (session="c86015b781750f58884130fac60ea02f", element="f4d20c56-f209-41c7-ad34-25c5935aed47")>]
tothemoon
  • 51
  • 3
  • Try scrolling down prior to finding the elements. – Arundeep Chohan Jun 22 '22 at 05:34
  • I tried scrolling down but not working following https://stackoverflow.com/questions/20986631/how-can-i-scroll-a-web-page-using-selenium-webdriver-in-python – tothemoon Jun 22 '22 at 05:49
  • 1
    Hi, your element CSS gives you a clue: `contain.lazyloaded` - it's very likely the content of the site is being loaded by JS after the page is ready. Selenium's default strategy is the page is loaded so it's running before the JS has executed and therefore before the JS loaded the images or structure. Slow the script the down with increased synchronisation - or find something (At the bottom of the page) that loads last. Arundeep Chohan is right too, it will need a scroll – RichEdwards Jun 22 '22 at 07:59
  • 1
    I've had a look at the site - it's more complex than at first glance. It's lazy loaded, you need to clear the delivery iframe and the images don't exist in the DOM to select interact and start browsing them. I've done the first bit and just look at the click logic shortly – RichEdwards Jun 22 '22 at 08:25

2 Answers2

3

Try this code to get required images

site = "https://crosset.onward.co.jp/coordinate/2828615"
driver = webdriver.Chrome(ChromeDriverManager().install(), options=options)
driver.get(site)

containers = driver.find_elements_by_css_selector("div.swiper-slide")

images = []

for container in containers:
    image = container.find_element(By.TAG_NAME, 'img')
    images.append(image.get_attribute('data-src'))

images = set(images)  # There are couple duplicates, so this line required to get unique images only
print(len(images))
print(images)
JaSON
  • 4,843
  • 2
  • 8
  • 15
2

JaSON beat me by a couple of minutes.

But - Here's another apporach which also clears the iframe:

#clear the iframe which displays the blocking popup
WebDriverWait(driver, 15).until(EC.frame_to_be_available_and_switch_to_it((By.XPATH, "//iframe[@id='buyee-bcFrame']")))
WebDriverWait(driver, 15).until(EC.element_to_be_clickable((By.XPATH, '//div[@class="bc__closeBtn"]'))).click()

#move driver back to the main window
driver.switch_to.parent_frame()


# #Click on all the thumbnails to do the lazy loading.
thumbnailGallery = driver.find_elements(By.CSS_SELECTOR, '.c-thumbnail-gallery a')
for thumbnail in thumbnailGallery:
    thumbnail.click()

#get the big links. The differnce between thumbnails and this is only _s and _l 
bigPictures = driver.find_elements(By.XPATH, "//img[contains(@src,'_l.jpg')]")
print("number of images : ", len(bigPictures))
for bigPicture in bigPictures:
    print(bigPicture.get_attribute("src"))


Output:

number of images :  12
https://static.staff-start.com/img/coordinates/33/c1082e40d6bd5c5d64f2ccb8f73cea5c-17430/3911f32407ac0e070fd20661f58d0b3a_l.jpg
https://static.staff-start.com/img/coordinates/33/c1082e40d6bd5c5d64f2ccb8f73cea5c-17430/bc9be091dd900105f744dd16f11543bf_l.jpg
https://static.staff-start.com/img/coordinates/33/c1082e40d6bd5c5d64f2ccb8f73cea5c-17430/3bc9b444bb17de5ea3e77e3bb92b4331_l.jpg
https://static.staff-start.com/img/coordinates/33/c1082e40d6bd5c5d64f2ccb8f73cea5c-17430/ef64b151cf56da6cd2460e2a51ff5d59_l.jpg
https://static.staff-start.com/img/coordinates/33/c1082e40d6bd5c5d64f2ccb8f73cea5c-17430/9b26f7bbd4a33313a7484a198a18d550_l.jpg
https://static.staff-start.com/img/coordinates/33/c1082e40d6bd5c5d64f2ccb8f73cea5c-17430/b854355ed8a04d2369e8a900ef9c93c1_l.jpg
https://static.staff-start.com/img/coordinates/33/c1082e40d6bd5c5d64f2ccb8f73cea5c-17430/8a470c7c5a5933cd3e24d308672860af_l.jpg
https://static.staff-start.com/img/coordinates/33/c1082e40d6bd5c5d64f2ccb8f73cea5c-17430/14b91842c6eac8513b8524e7b3ccd041_l.jpg
https://static.staff-start.com/img/coordinates/33/c1082e40d6bd5c5d64f2ccb8f73cea5c-17430/5e0a42a4e2755037c7d283a6e490478c_l.jpg
https://static.staff-start.com/img/coordinates/33/c1082e40d6bd5c5d64f2ccb8f73cea5c-17430/25ecddec003434b66d7f647eeb39493b_l.jpg
https://static.staff-start.com/img/coordinates/33/c1082e40d6bd5c5d64f2ccb8f73cea5c-17430/3911f32407ac0e070fd20661f58d0b3a_l.jpg
https://static.staff-start.com/img/coordinates/33/c1082e40d6bd5c5d64f2ccb8f73cea5c-17430/bc9be091dd900105f744dd16f11543bf_l.jpg
RichEdwards
  • 3,423
  • 2
  • 6
  • 22