I recently asked a question (referenced here: Python Web Scraping (Beautiful Soup, Selenium and PhantomJS): Only scraping part of full page) that helped to identify a problem I had with scraping all the contents of a page that dynamically updates when one scrolls. However I am still unable to wrangle my code to point to the correct element using selenium and scroll down the page iteratively. I also found that, when I manually scroll down the page in question some of the original content when the page loaded disappears while the new content updates. For example, look at the image below...
I have targeted the container with the data I am trying to scrape below (highlighted in blue).
First off I am having trouble selecting the right element to scroll down the page as I have never had to do this before. I believe I would have to use selenium to target the container and then use the "execute_script" function to then scroll down the page because this table is embedded within the body of the web page. However I can't seem to get that to work.
scroll = driver.find_element_by_class_name("ag-body-viewport")
driver.execute_script("arguments[0].scrollIntoView();", scroll)
Second, once I have the ability to scroll, I will need to scroll down a little at a time and scrape iteratively. What I mean is that, if you look in the image you will see a bunch of 'div' tags inside of the
For example... when the page loads and I pass the html to Beautifulsoup. I can scrape the first 40 rows. If I scroll down, say 40 rows, I will then pass row 40 - 80 to beautifulsoup and rows 1 - 40 will no longer be available as the data has dynamically updated...
Long story short, what I want is to be able to scrape all the content in the image provided then use selenium to scroll down roughly 40 rows, scrape the next 40, then scroll down and scrape the next 40 and so on... Any tips on how to get selenium to scroll in this embedded container and how would one go about scrolling down iteratively in order to capture all the data in the container when it dynamically updates as you scroll. Any extra help will be much appreciated.