I am trying to scrape a news page (thenextweb.com) which has infinite scrolling pages.
I have written a function to scroll but it takes too much time to scroll. I had to use the time.sleep()
because my internet connection is weak and it gets time to load new pages.
Here is my scroll down function, I have used the solution of this question: "https://stackoverflow.com/questions/20986631/how-can-i-scroll-a-web-page-using-selenium-webdriver-in-python"
def scrolldown(urltoscroll):
browser.get(urltoscroll)
last_height = browser.execute_script("return document.body.scrollHeight")
next_button = browser.find_element_by_xpath('//*[@id="channelPaginate"]')
while True:
browser.execute_script("window.scrollTo(0, document.body.scrollHeight);")
time.sleep(6)
next_button.click()
time.sleep(8)
new_height = browser.execute_script("return document.body.scrollHeight")
time.sleep(6)
if new_height == last_height:
break
last_height = new_height
Is there any other way to handle those kinds of pages in an easier way?
Thank you
edit: the link that i want to scrape: "https://thenextweb.com/plugged/". I want to get article hrefs.