7

I have a dynamic page that loads products when the user scrolls down a page. I want to get the total number of products rendered on the display page. Currently I am using the following code to get to the bottom until all the products are being displayed.

elems = WebDriverWait(self.driver, 30).until(EC.presence_of_all_elements_located((By.CLASS_NAME, "x")))
print len(elems)
a = len(elems)
self.driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
time.sleep(4)
elem1 = WebDriverWait(self.driver, 30).until(EC.presence_of_all_elements_located((By.CLASS_NAME, "x")))
b = len(elem1)
while b > a:
    self.driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
    time.sleep(4)
    elem1 = WebDriverWait(self.driver, 30).until(EC.presence_of_all_elements_located((By.CLASS_NAME, "x")))
    a = b
    b = len(elem1)
print b

This is working nicely, but I want to know whether there is any better option of doing this?

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
Saheb
  • 1,666
  • 3
  • 18
  • 24
  • Presumably there's an endpoint that gets called when a suitable scroll occurs... Could you manipulate that in some way other than scrolling? – Jon Clements Feb 13 '14 at 11:41
  • possible duplicate of [Scroll Element into View with Selenium](http://stackoverflow.com/questions/3401343/scroll-element-into-view-with-selenium) – Erki M. Feb 13 '14 at 11:54
  • @Erki M. I guess this question is a bit different from the one you guys are referring to. Plus I need a solution with Python. [I don't have any idea of selenium with Java]. I tried using the javascript in that post. But it is not working. Error: "WebDriverException: Message: u'html is undefined'" – Saheb Feb 13 '14 at 12:13

3 Answers3

8

You can perform this action easily using this line of code

driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")

And if you want to scroll down for ever you should try this.

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import time

driver = webdriver.Firefox()
driver.get("https://twitter.com/BarackObama")

while True:
    driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
    time.sleep(3)

I am not sure about time.sleep(x value) cause loading data my take longer .. or less .. for more information please check the official Doc page

have fun :)

Ayyoub
  • 4,581
  • 2
  • 19
  • 32
  • finally, been working on this for a while. I had this exact script, but didn't have it in a loop but still had a `sleep(3)` in there, which I guess was only going to the bottom of the first page, what was already shown... So, thank you @Ayoub! As an aside, some sites may be big enough that you could potentially load forever (ie. twitter), so it might be smarter to make a smaller while statement `while i in range(0,100,1):` or something to that effect! – ntk4 Sep 24 '16 at 04:44
2

I think you could condense your code down to this:

prior = 0
while True:
    self.driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
    current = len(WebDriverWait(self.driver, 30).until(EC.presence_of_all_elements_located((By.CLASS_NAME, "x"))))
    if current == prior:
        return current
    prior = current

I did away with all the identical lines by moving them all into the loop, which necessitated making the loop a while True: and moving the condition checking into the loop (because unfortunately, Python lacks any do-while).

I also threw out the sleep and print statements - I'm not sure what their purpose was, but on my own page, I have found that the same number of elements load whether I sleep between scrolls or not. Further, in my own case, I don't need to know the count at any point, I just need to know when it has exhausted the list (but I added in a return variable so you can get the final count if you happen to need it. If you really want to print ever intermediate count, you can print current right after it's assigned in the loop.

ArtOfWarfare
  • 20,617
  • 19
  • 137
  • 193
1

If you have no idea how many elements might be added to the page, but you just want to get all of them, it might be good to loop thusly:

  • scroll down as described above
  • wait a few seconds
  • save the size of the page source (xxx.page_source)
  • if the size of the page source is larger than the last page source size saved, loop back and scroll down some more

I suppose that screenshot size might work fine too, depending upon the page you're loading, but this is working in my current program.

aomoore3
  • 11
  • 1