Selenium: retrieve data that loads while scrolling down

Question

I'm trying to retrieve elements in a page that has an ajax-load scroll-down functionality alla Twitter. For some reason this isn't working properly. I added some print statements to debug it and I always get the same amount of items and then the function returns. What am I doing wrong here?

wd = webdriver.Firefox()
wd.implicitly_wait(3)

def get_items(items):
    print len(items)
    wd.execute_script("window.scrollTo(0, document.body.scrollHeight);")
    # len(items) and len(wd.find_elements-by...()) both always seem to return the same number
    # if I were to start the loop with while True: it would work, but of course... never end
    while len(wd.find_elements_by_class_name('stream-item')) > len(items):
        items = wd.find_elements_by_class_name('stream-item')
        print items
        wd.execute_script("window.scrollTo(0, document.body.scrollHeight);")
    return items

def test():
    get_page('http://twitter.com/')
    get_items(wd.find_elements_by_class_name('stream-item'))

what do you mean by not working properly? any error message? — Amey, Jan 29 '13 at 13:57
No, its simply not finding the data that is loaded when I execute the scrolldown so the condition in the while loop is always false — la_f0ka, Jan 29 '13 at 14:01

Amey · Accepted Answer · 2015-09-17T20:12:39.480

Try putting a sleep in between

wd = webdriver.Firefox()
wd.implicitly_wait(3)

def get_items(items):
    print len(items)
    wd.execute_script("window.scrollTo(0, document.body.scrollHeight);")
    # len(items) and len(wd.find_elements-by...()) both always seem to return the same number
    # if I were to start the loop with while True: it would work, but of course... never end

    sleep(5) #seconds
    while len(wd.find_elements_by_class_name('stream-item')) > len(items):
        items = wd.find_elements_by_class_name('stream-item')
        print items
        wd.execute_script("window.scrollTo(0, document.body.scrollHeight);")
    return items

def test():
    get_page('http://twitter.com/')
    get_items(wd.find_elements_by_class_name('stream-item'))

Note: The hard sleep is just for demonstrating that it works. Please use the waits package to wait for a smart condition instead.

Does the page actually scroll to the bottom? Do you visibly see the new tweets load? — Amey, Jan 29 '13 at 16:32

score 0 · Answer 2 · answered Oct 21 '17 at 11:29

The condition in the while loop was the issue for my use case. It was an infinite loop. I fixed the problem by using a counter :

def get_items(items):

    item_nb = [0, 1] # initializing a counter of number of items found in page

    while(item_nb[-1] > item_nb[-2]):   # exiting the loop when no more new items can be found in the page

        items = wd.find_elements_by_class_name('stream-item')
        time.sleep(5)
        browser.execute_script("window.scrollTo(0, document.body.scrollHeight);")

        item_nb.append(len(items))

    return items

```

Selenium: retrieve data that loads while scrolling down

2 Answers2

Linked