Scrolling dynamic content using selenium in python not working

Question

https://www.narendramodi.in/category/text-speeches -> I wanted to scrape this page. As this a dynamic one, I need to scroll down until the bottom of the page and then get the HTML content to scrape it. But when this website is opened through selenium chrome web driver, neither manually nor automatically is the website loading dynamically as I scroll down. When the website is opened from normal chrome, it works just fine. I even tried with firefox driver and the result is same. Here's the code that I have tried out.

driver = webdriver.Chrome(executable_path=r'C:/tools/drivers/chromedriver.exe')
driver.get('https://www.narendramodi.in/news')
# https://stackoverflow.com/a/27760083

SCROLL_PAUSE_TIME = 2.0
# Get scroll height
last_height = driver.execute_script("return document.body.scrollHeight")
print(last_height)

while True:
    # Scroll down to bottom
    time.sleep(SCROLL_PAUSE_TIME)

    driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")

    # Wait to load page
    time.sleep(SCROLL_PAUSE_TIME)

    # Calculate new scroll height and compare with last scroll height
    new_height = driver.execute_script("return document.body.scrollHeight")
    print(new_height)
    if new_height == last_height:
        break
    last_height = new_height


res = driver.execute_script("return document.documentElement.outerHTML")
driver.quit()

soup = BeautifulSoup(res, 'lxml')

How can I scrape this entire page?

Can you not just scrape the API that populates the page rather than using selenium? — Iain Shelvington, Feb 02 '20 at 15:43
It seems like Infinite Scroll, but you can refer to following link: https://stackoverflow.com/questions/59838948/scraping-javascript-table-with-a-scroll-using-selenium/59839533#59839533 — Yun, Feb 06 '20 at 10:58

score 0 · Answer 1 · answered Mar 10 '22 at 15:51

0

Some website detects the use of Selenium and stop loading its content. You can try tuning Selenium settings or using a package like selenium-stealth (pypi link: https://pypi.org/project/selenium-stealth/)

answered Mar 10 '22 at 15:51

Luigi Palumbo

53
9

Scrolling dynamic content using selenium in python not working

1 Answers1