Why is selenium only picking up the first 12 items?

Question

I'm trying to create a web scraper for a website (https://pokemondb.net/pokedex/national) that copies a list of images and saves them in a directory. Everything seems to work, except that instead of picking up the 800+ items that I was hoping it would, it only picks up 12. I've tried using selenium's implicit_wait, but it doesn't seem to work. I would like it to scrape every picture on the page.

Below is my code:

from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
import shutil
import os
import requests

def spritescrape(driver):
    sprites_list = driver.find_elements_by_tag_name('img')
    sprite_srcs = [sprite.get_attribute('src') for sprite in sprites_list]
    return sprite_srcs

def download_images(srcs, dirname):
    for index, src in enumerate(srcs):
        response = requests.get(src, stream=True)
        save_image(response, dirname, index)
    del response

def save_image(image, dirname, suffix):
    with open('{dirname}/img_{suffix}.jpg'.format(dirname=dirname, suffix=suffix), 'wb') as out_file:
        shutil.copyfileobj(image.raw, out_file)

def make_dir(dirname):
    current_path = os.getcwd()
    path = os.path.join(current_path, dirname)
    if not os.path.exists(path):
        os.makedirs(path)

if __name__ == '__main__':
    chromeexe_path = r'C:\code\Learning Python\Scrapers\chromedriver.exe'
    driver = webdriver.Chrome(executable_path=chromeexe_path)
    driver.get(r'https://pokemondb.net/pokedex/national')
    driver.implicitly_wait(10)

    sprite_links = spritescrape(driver)
    dirname = 'sprites'
    make_dir(dirname)
    download_images(sprite_links, dirname)

I've heard that some websites can be built in ways that prevent scraping, and I wonder if this is the case for this website. I'm very new to coding, so any help with getting all of the images would be greatly appreciated!

score 2 · Answer 1 · answered Jan 27 '20 at 20:24

You need to scroll the pages to the bottom.However if you go directly to the scrollHeight you will loose all the elements again.you need to use infinite loop and scroll slowly per page and add the elements attribute during scrolling so that it never lost further.I have got 890 elements.

Try the below code.

from selenium import webdriver
driver = webdriver.Chrome()
driver.get("https://pokemondb.net/pokedex/national")

sprite_srcs=[]
height=1000
itemsnobefore=len(sprite_srcs)
while True:
    driver.execute_script("window.scrollTo(0," + str(height) + ");")
    sprites_list = driver.find_elements_by_tag_name('img')

    for sprite in sprites_list:
        if sprite.get_attribute('src') not in sprite_srcs:
            sprite_srcs.append(sprite.get_attribute('src'))

    itemsnoafter=len(sprite_srcs)
    #Break the loop when there is no more image tag left
    if itemsnobefore==itemsnoafter:
        break
    itemsnobefore=itemsnoafter
    height=height+500
    time.sleep(0.25)

print(len(sprites_list))

Thank you so much - this worked perfectly! Can you explain how you got a starting height of 1000? — zeshaykes, Jan 27 '20 at 20:43

score 0 · Answer 2 · answered Jan 27 '20 at 19:56

0

All the elements are not loading when the page first opens. It appears they only load as your scroll down the page. What I've done in situations like this is to do a scroll to the bottom of the page first and then find elements. This has worked for my needs.

driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")

answered Jan 27 '20 at 19:56

RKelley

1,099
8
14

Thank you! I just tried this, and it seems like it got the first 12 elements and the last 20, but none of the ones in between. Is there a way to scroll a bit, scrape, scroll a bit more, scrape again, etc. on repeat until I reach the end of the page? I'm not familiar with scrolling in selenium. – zeshaykes Jan 27 '20 at 20:17
Thank you. I'm actually going to go with the answer @KunduK kindly provided for now, but if you have anything you think is better/more efficient I would be very happy to see it. – zeshaykes Jan 27 '20 at 20:46

undetected Selenium · Answer 3 · 2020-01-27T21:42:56.687

The elements within the website uses Lazy Loading. So to extract the list of src attributes of the images you have to scroll down till the end of the page and you can use the following Locator Strategies:

Code Block:

from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

options = webdriver.ChromeOptions() 
options.add_argument("start-maximized")
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
driver = webdriver.Chrome(options=options, executable_path=r'C:\WebDrivers\chromedriver.exe')
driver.get("https://pokemondb.net/pokedex/national")
myLength = len(WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//img[@class]"))))
while True:
    try:
        driver.execute_script("window.scrollBy(0,1500)", "");
        WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//img[@class]")))
        WebDriverWait(driver, 20).until(lambda driver: len(driver.find_elements_by_xpath("//img[@class]")) > myLength)
        elements = driver.find_elements_by_xpath("//img[@class]")
        myLength = len(elements)
    except TimeoutException:
        break
print(myLength)
for element in elements:
    print(element.get_attribute("src"))
driver.quit()

Console Output:

890
https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/bulbasaur.png
https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/ivysaur.png
https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/venusaur.png
https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/charmander.png
https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/charmeleon.png
https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/charizard.png
https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/squirtle.png
https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/wartortle.png
https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/blastoise.png
https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/caterpie.png
https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/metapod.png
https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/butterfree.png
https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/weedle.png
https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/kakuna.png
https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/beedrill.png
https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/pidgey.png
https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/pidgeotto.png
https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/pidgeot.png
https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/rattata.png
https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/raticate.png
https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/spearow.png
https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/fearow.png
https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/ekans.png
https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/arbok.png
https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/pikachu.png
https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/raichu.png
https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/sandshrew.png
https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/sandslash.png
https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/nidoran-f.png
https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/nidorina.png
https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/nidoqueen.png
https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/nidoran-m.png
https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/nidorino.png
https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/nidoking.png
https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/clefairy.png
https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/clefable.png
https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/vulpix.png
https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/ninetales.png
https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/jigglypuff.png
https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/wigglytuff.png
https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/zubat.png
https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/golbat.png
https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/oddish.png
https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/gloom.png
https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/vileplume.png
https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/paras.png
https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/parasect.png
https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/venonat.png
https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/venomoth.png
https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/diglett.png
https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/dugtrio.png
https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/meowth.png
https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/persian.png
https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/psyduck.png
https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/golduck.png
https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/mankey.png
https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/primeape.png
https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/growlithe.png
https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/arcanine.png
.
.
.
https://img.pokemondb.net/sprites/sword-shield/pixel/dreepy.png
https://img.pokemondb.net/sprites/sword-shield/pixel/drakloak.png
https://img.pokemondb.net/sprites/sword-shield/pixel/dragapult.png
https://img.pokemondb.net/sprites/sword-shield/pixel/zacian-crowned.png
https://img.pokemondb.net/sprites/sword-shield/pixel/zamazenta-crowned.png
https://img.pokemondb.net/sprites/sword-shield/pixel/eternatus.png

Why is selenium only picking up the first 12 items?

3 Answers3