1

So I have the following issue whilst working with python and selenium:

I am currently trying to get the source links of images on a webpage. I put a in the xpath because that is the position that changes in the loop. There are like 30 images on the website.

driver.get(url)
a = 1
while a != 100:
   try:    
      print(WebDriverWait(driver, 3).until(EC.visibility_of_element_located((By.XPATH, "/html/body/div[4]/div[5]/div[2]/div[3]/div/div/div/div/div['+ str(a) +']/div[1]/a/img"))).get_attribute('src'))
   except:
       break
   a = a + 1

It worked fine for the first 8 pictures, but for the ninth, it gave an error. It couldn't find it. After checking the code and trying the command outside the loop like this:

print(WebDriverWait(driver, 3).until(EC.visibility_of_element_located((By.XPATH, "/html/body/div[4]/div[5]/div[2]/div[3]/div/div/div/div/div[9]/div[1]/a/img"))).get_attribute('src'))

It still couldn't get the ninth picture. However, I discovered after an hour, if i maximized the window, it was able to get the links up to link 19.

So my questions are:

Why is my program dependent on the window size in order to locate elements?

Suppose i have an enormous long website, with 1000 images on it, and with max window you can see only 12 of them, how can I get all the links from every picture? Should I write a code that scrolls down and executes the code, scrolls further down and execudes, and again and again? Or is there a better way?

Note: I am very new to the programming world, i dont use classes or other fancy things (OOP) so please keep that in mind when answering my questions

Can Karakus
  • 21
  • 1
  • 6
  • Maybe the html render is different for different screen sizes, also pagination can do that, can you share the link you are trying to crawl to view html – cristian camilo cedeño gallego Mar 08 '20 at 03:36
  • 2
    This is probably the result of lazy loading, new images are added to the DOM with JavaScript when you scroll to the bottom of the page. – Guy Mar 08 '20 at 05:22
  • Thanks for replying. The link of the website I need to crawl is a company website that only I have access to. Yes i noticed the lazy loading too. What can i do to tackle this problem. Do i need to know JavaScript? – Can Karakus Mar 08 '20 at 12:30
  • i found a similair website. : [link] https://www.bookdepository.com/category/2/Art-Photography/browse/viewmode/all [link] – Can Karakus Mar 09 '20 at 19:02

0 Answers0