0

I am currently attempting to scrape a DropBox Folder using Selenium on Python. Apparently, if I try to select all hyperlinks (or all elements containing hyperlinks), I only get the first 20 or so results. To give a minimum working example:

from selenium import webdriver
browser = webdriver.Chrome()
page = www.dropbox.com/FolderName
browser.get(page)

elementlist = browser.find_elements_by_class_name('brws-file-name-cell-filename')
#or alternatively, you can simply use the 'by_tag_name('a') method, which yields similar results)
elength = len(elementlist)

Usually, elength is in the order of 20 to 30 elements, which grows to 30 to 40 I add a command to scroll down to the bottom of the page. I know for a fact that there are well over 200 elements in the folder I am trying to scrape. My question is, thus: is there any way to scroll down the page progressively, rather than going all the way to the bottom right away? I have seen that many questions asked on the same topic focus on pages with infinite loading, like Facebook or other social media. My page, on the other hand, has a fixed length. Is there a way I can scroll down step by step, rather than all at once?

UPDATE

I tried following the advice given to me by the community and by the answer you can find here. Unfortunately, I am still struggling to iterate over the height, which is my variable of interest and which seems to be stuck in a string. This has been my best attempt at creating a for loop over the height, and needless to say, it still did not work.

# Get current height
height =  browser.execute_script("return document.body.scrollHeight")

while True:
    # Scroll down
    browser.execute_script('window.scrollTo(0, window.scroll'+str(height)+' + 200)')

    # Wait to load page
    time.sleep(SCROLL_PAUSE_TIME)

    # Calculate new scroll height and compare with last scroll height
    new_height = browser.execute_script("return document.body.scrollHeight")
    if new_height == height:
        break
    else:
        height = new_height

UPDATE 2

I think I've found the issue. Dropbox basically has a 'page within the page' structure. The whole of the page is visible to me, but there's an inner archive which I need to navigate. Any idea how to do that?

Anonymous
  • 41
  • 5
  • Does this answer your question? [How can I scroll a web page using selenium webdriver in python?](https://stackoverflow.com/questions/20986631/how-can-i-scroll-a-web-page-using-selenium-webdriver-in-python) – 0buz May 06 '20 at 15:27
  • I just tried using the methods recommended to me in that thread. It doesn't seem to work, one way or another. Not even with the window.scrollY command. – Anonymous May 07 '20 at 14:46

2 Answers2

1

You could try this answer. Instead of going to the bottom, you could create a for loop with a fixed height and iterate till reach the bottom.

Juanje
  • 1,235
  • 2
  • 11
  • 28
0

browser.execute_script('window.scrollTo(0, window.scroll'+str(height)+' + 200)')

The second argument inside Javascript method seems odd to me. Lets assume your height variable is 800px so we get this javascript function to execute inside execute_script(execute_script is a selenium method which lets you code javascript).

window.scrollTo(0, window.scroll800 + 200) and I assume this will throw an error and stop the execution. I think you should change your code to this.

browser.execute_script('window.scrollTo(0,'+str(height)+' + 200)')

This code will scroll your window to the bottom of the page(One tip: you can actually just go to devtools of your browser and open the console and try the javascript code there.If it works, you can come back to selenium). At this point you should make your driver instance sleep. Once it loads the page(make sure to give it enough time to load), you should assing the new height value to a new variable. If the page has loaded more elements at the bottom of the page, first height and new height values should be different and that requires another scroll to the bottom. But before scroll you should change the first height value and assign new height value to it so in the next loop your first height will be the second height from previous loop.

Ozan Yılmaz
  • 110
  • 1
  • 10
  • So the code I posted in the edit is fundamentally functional, I just need to change the syntax of that particular command. Correct? – Anonymous May 08 '20 at 08:53
  • Exactly, just try the scroll function on browser console first and then you will see. – Ozan Yılmaz May 08 '20 at 19:50
  • try document.getElementBy***(Element's className,ID or JS path here depending on which method you choose).scrollTo(values here) as an argument to execute_script. With getElementBy***, you should choose the element you want to scroll. – Ozan Yılmaz May 12 '20 at 18:39