0

So there is this e-commerce page https://www.jooraccess.com/r/products?token=feba69103f6c9789270a1412954cf250 and there are hundreds of products, and for each product there is a slider with images (or slideshow or whatever you call it). I just need to scrape all the images from the page. I understand how to grab first images in each slider, I just can't figure out how to scrape the rest of the images in each slider.

I have inspected the element and noticed that each time I change the image in the slider, this part

<div data-position="4" class="PhotoBreadcrumb_active__2T6z2 PhotoBreadcrumb_dot__2PbsQ"></div> 

moves down these positions (in the example below image#4 is selected)

<div class="PhotoBreadcrumb_breadcrumbContainer__2cALf" data-testid="breadcrumbContainer">
    <div data-position="0" class="PhotoBreadcrumb_dot__2PbsQ"></div>
    <div data-position="1" class="PhotoBreadcrumb_dot__2PbsQ"></div>
    <div data-position="2" class="PhotoBreadcrumb_dot__2PbsQ"></div>
    <div data-position="3" class="PhotoBreadcrumb_dot__2PbsQ"></div>
    <div data-position="4" class="PhotoBreadcrumb_active__2T6z2 PhotoBreadcrumb_dot__2PbsQ"></div>
    <div data-position="5" class="PhotoBreadcrumb_dot__2PbsQ"></div>
</div>
hkm
  • 342
  • 1
  • 2
  • 10

2 Answers2

2

To scrape all the values of the src attributes from the first slide you need to:

  • Click on each slide inducing WebDriverWait for the element_to_be_clickable()

  • Collect the value of each src attribute inducing WebDriverWait for the visibility_of_element_located()

  • You can use the following locator strategies:

    driver.get("https://www.jooraccess.com/r/products?token=feba69103f6c9789270a1412954cf250")
    print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//div[contains(@class, 'Grid_Row__2R-IV') and contains(@class, 'Grid_left')]/div//img"))).get_attribute("src"))
    WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//div[contains(@class, 'Grid_Row__2R-IV') and contains(@class, 'Grid_left')]/div//div[@data-position='1']"))).click()
    print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//div[contains(@class, 'Grid_Row__2R-IV') and contains(@class, 'Grid_left')]/div//img"))).get_attribute("src"))
    WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//div[contains(@class, 'Grid_Row__2R-IV') and contains(@class, 'Grid_left')]/div//div[@data-position='2']"))).click()
    print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//div[contains(@class, 'Grid_Row__2R-IV') and contains(@class, 'Grid_left')]/div//img"))).get_attribute("src"))
    WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//div[contains(@class, 'Grid_Row__2R-IV') and contains(@class, 'Grid_left')]/div//div[@data-position='3']"))).click()
    print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//div[contains(@class, 'Grid_Row__2R-IV') and contains(@class, 'Grid_left')]/div//img"))).get_attribute("src"))
    WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//div[contains(@class, 'Grid_Row__2R-IV') and contains(@class, 'Grid_left')]/div//div[@data-position='3']"))).click()
    print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//div[contains(@class, 'Grid_Row__2R-IV') and contains(@class, 'Grid_left')]/div//img"))).get_attribute("src"))
    
  • Console Output:

    https://cdn.jooraccess.com/img/uploads/accounts/678917/images/Sundays_NYC_3202%20(1).jpg
    https://cdn.jooraccess.com/img/uploads/accounts/678917/images/Sundays_NYC_3207.jpg
    https://cdn.jooraccess.com/img/uploads/accounts/678917/images/Maya%20dress_Floral03.jpg
    https://cdn.jooraccess.com/img/uploads/accounts/678917/images/Maya%20dress_Floral04.jpg
    https://cdn.jooraccess.com/img/uploads/accounts/678917/images/Maya%20dress_Floral05.jpg
    
undetected Selenium
  • 183,867
  • 41
  • 278
  • 352
0

You can not collect all those images automatically.
Only 1 image per product is presented and exists on the page each time.
In order to change the image / load another image you have to click on thumbnails radio buttons below each product. This causes some JS to load another image for that product.
In other words, the other, not displayed images, are not existing on the page until they loaded by clicking on the radio buttons - thumbnails below each products.

Prophet
  • 32,350
  • 22
  • 54
  • 79
  • Ok, how do I collect the 1st images though? I thought I got it figure out but my methods didn't work – hkm Feb 28 '22 at 19:33
  • Please show what you have tried so far and we will try to help – Prophet Feb 28 '22 at 19:34
  • This XPath `//div[contains(@class,'Photo')]//img` will give you all the images. What and how to do with it depends on your implementation flow. – Prophet Feb 28 '22 at 19:39