0

https://www.narendramodi.in/category/text-speeches -> I wanted to scrape this page. As this a dynamic one, I need to scroll down until the bottom of the page and then get the HTML content to scrape it. But when this website is opened through selenium chrome web driver, neither manually nor automatically is the website loading dynamically as I scroll down. When the website is opened from normal chrome, it works just fine. I even tried with firefox driver and the result is same. Here's the code that I have tried out.

driver = webdriver.Chrome(executable_path=r'C:/tools/drivers/chromedriver.exe')
driver.get('https://www.narendramodi.in/news')
# https://stackoverflow.com/a/27760083

SCROLL_PAUSE_TIME = 2.0
# Get scroll height
last_height = driver.execute_script("return document.body.scrollHeight")
print(last_height)

while True:
    # Scroll down to bottom
    time.sleep(SCROLL_PAUSE_TIME)

    driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")

    # Wait to load page
    time.sleep(SCROLL_PAUSE_TIME)

    # Calculate new scroll height and compare with last scroll height
    new_height = driver.execute_script("return document.body.scrollHeight")
    print(new_height)
    if new_height == last_height:
        break
    last_height = new_height


res = driver.execute_script("return document.documentElement.outerHTML")
driver.quit()

soup = BeautifulSoup(res, 'lxml')

How can I scrape this entire page?

Sai Sandeep
  • 61
  • 1
  • 2
  • Can you not just scrape the API that populates the page rather than using selenium? – Iain Shelvington Feb 02 '20 at 15:43
  • It seems like Infinite Scroll, but you can refer to following link: https://stackoverflow.com/questions/59838948/scraping-javascript-table-with-a-scroll-using-selenium/59839533#59839533 – Yun Feb 06 '20 at 10:58

1 Answers1

0

Some website detects the use of Selenium and stop loading its content. You can try tuning Selenium settings or using a package like selenium-stealth (pypi link: https://pypi.org/project/selenium-stealth/)