0

I'm working on scraping all of the Air Jordan Data off of grailed.com (https://www.grailed.com/designers/jordan-brand/hi-top-sneakers). I am storing the size, model, url, and image url in an object. I currently have a program that scrolls through the entire feed and fetches all of this. Everything works except finding the image url. I have tried many things and the issue seems to be that for some elements in the feed Selenium doesn't detect the div or url containing the image. I have gone through and manually checked these cases, and they do indeed have images in the same structure. My current code looks like this:

       feed = driver.find_elements_by_class_name('feed-item')
       for item in feed:
          # Find the div containing the image 
          img_div = item.find_element_by_class_name("listing-cover-photo ")
          img = img_div.find_element_by_tag_name('img')

I have tried a couple other things as well. The issue is that sometimes it says it can't find elements with the "listing-cover-photo", even though I can check the items for which this is the case and I can still find the elements. How should I debug/fix this, or can anyone help?

Eric Hasegawa
  • 169
  • 1
  • 14

1 Answers1

0

To get the image src value you need to scroll the page first. Induce WebDriverWait() and wait for visibility_of_all_elements_located() and following css selector.

from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By

driver = webdriver.Chrome()
driver.get("https://www.grailed.com/designers/jordan-brand/hi-top-sneakers")
driver.execute_script("window.scrollTo(0,document.body.scrollHeight)")
images=WebDriverWait(driver,10).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR,".feed-item .listing-cover-photo>img")))
for image in images:
    print(image.get_attribute("src"))

Output:

https://process.fs.grailed.com/AJdAgnqCST4iPtnUxiGtTz/auto_image/cache=expiry:max/rotate=deg:exif/resize=height:320,width:240,fit:crop/output=quality:70/compress/https://cdn.fs.grailed.com/api/file/AYHhwtgRxSkdTtZ2fMoi
https://process.fs.grailed.com/AJdAgnqCST4iPtnUxiGtTz/auto_image/cache=expiry:max/rotate=deg:exif/resize=height:320,width:240,fit:crop/output=quality:70/compress/https://cdn.fs.grailed.com/api/file/yPm24xb1QeyNJvmlKriU
https://process.fs.grailed.com/AJdAgnqCST4iPtnUxiGtTz/auto_image/cache=expiry:max/rotate=deg:exif/resize=height:320,width:240,fit:crop/output=quality:70/compress/https://cdn.fs.grailed.com/api/file/0PmW3y2SOmvy9iDHr44q
https://process.fs.grailed.com/AJdAgnqCST4iPtnUxiGtTz/auto_image/cache=expiry:max/rotate=deg:exif/resize=height:320,width:240,fit:crop/output=quality:70/compress/https://cdn.fs.grailed.com/api/file/0huJrabvQyei6H8xVZWS
https://process.fs.grailed.com/AJdAgnqCST4iPtnUxiGtTz/auto_image/cache=expiry:max/rotate=deg:exif/resize=height:320,width:240,fit:crop/output=quality:70/compress/https://cdn.fs.grailed.com/api/file/23Bx5rr8SR2Pv53lO9Hb
https://process.fs.grailed.com/AJdAgnqCST4iPtnUxiGtTz/auto_image/cache=expiry:max/rotate=deg:exif/resize=height:320,width:240,fit:crop/output=quality:70/compress/https://cdn.fs.grailed.com/api/file/dsdGACdNRse93DpTN9Sl
https://process.fs.grailed.com/AJdAgnqCST4iPtnUxiGtTz/auto_image/cache=expiry:max/rotate=deg:exif/resize=height:320,width:240,fit:crop/output=quality:70/compress/https://cdn.fs.grailed.com/api/file/KQ3z8G9DQFWTjNkO6Obp
https://process.fs.grailed.com/AJdAgnqCST4iPtnUxiGtTz/auto_image/cache=expiry:max/rotate=deg:exif/resize=height:320,width:240,fit:crop/output=quality:70/compress/https://cdn.fs.grailed.com/api/file/mF8nkq8LTzi2fTuCfAAS
https://process.fs.grailed.com/AJdAgnqCST4iPtnUxiGtTz/auto_image/cache=expiry:max/rotate=deg:exif/resize=height:320,width:240,fit:crop/output=quality:70/compress/https://cdn.fs.grailed.com/api/file/X9tLf5KzSreO1QW2QX4w
https://process.fs.grailed.com/AJdAgnqCST4iPtnUxiGtTz/auto_image/cache=expiry:max/rotate=deg:exif/resize=height:320,width:240,fit:crop/output=quality:70/compress/https://cdn.fs.grailed.com/api/file/gNnXP7ToTnl9hjSEiRrz
https://process.fs.grailed.com/AJdAgnqCST4iPtnUxiGtTz/auto_image/cache=expiry:max/rotate=deg:exif/resize=height:320,width:240,fit:crop/output=quality:70/compress/https://cdn.fs.grailed.com/api/file/LMFdqBosRI2NLDCkR9Ze
https://process.fs.grailed.com/AJdAgnqCST4iPtnUxiGtTz/auto_image/cache=expiry:max/rotate=deg:exif/resize=height:320,width:240,fit:crop/output=quality:70/compress/https://cdn.fs.grailed.com/api/file/htBeZs05SNyflHqpd7pC
KunduK
  • 32,888
  • 5
  • 17
  • 41
  • This is great! However, I am looking to pull each image as I iterate through the list of items, how would I alter this code so that it pulls one at a time using item, rather than pulling this huge list? Also, this pulls 16 images per scroll, whereas there are 40 items. – Eric Hasegawa Jun 23 '20 at 20:26
  • @EricHasegawa : No there are not 40 items.If you continue scroll you will get more items.You need to scroll infinite loop and add image src value in the list. – KunduK Jun 23 '20 at 21:21
  • I am doing that, I meant 40 items per each individual scroll. Is there a way to get each individual image as I loop through it, instead of getting one big block of them as you did here? Also, if you don't want to spend much time on it, if you could point me in the right direction/towards resources that would be great. – Eric Hasegawa Jun 23 '20 at 21:34