0

I am trying to get all the hotels but even though I have executed scrolled down script my page_source shows just the html code that contains 11 hotels i.e. what was loaded initially.

How can I get the entire data source code after scrolling down to scrape all the hotels?

If driver.execute script is loading the entire page then how do I store the page source of entire page in my variable?

PS: this is just for educational purpose

from selenium import webdriver
import re
import pandas as pd
import time
chrome_path = r"C:\Users\ajite\Desktop\web scraping\chromedriver.exe"
driver = webdriver.Chrome(chrome_path)
driver.get('https://www.makemytrip.com/mmthtl/site/hotels/search?checkin=02252018&checkout=02262018&roomStayQualifier=1e0e&city=GOI&searchText=Goa,%20India&country=IN')

driver.implicitly_wait(3)
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
time.sleep(5)

two_hotels = driver.find_elements_by_xpath('//*[@id="hotel_card_list"]/div')
Jason Aller
  • 3,541
  • 28
  • 38
  • 38

1 Answers1

1

Your scroll is not being executed, instead of:

driver.execute_script("window.scrollTo(0, document.body.scrollHeight);") 

you should try:

for i in range(0,25): # here you will need to tune to see exactly how many scrolls you need
  driver.execute_script('window.scrollBy(0, 400)')
  time.sleep(1)

The code I tried:

import selenium
import time
from selenium import webdriver
driver = webdriver.Chrome()

driver.get("https://www.makemytrip.com/mmthtl/site/hotels/search?checkin=02252018&checkout=02262018&roomStayQualifier=1e0e&city=GOI&searchText=Goa,%20India&country=IN")
driver.implicitly_wait(3)

for i in range(0,25): # here you will need to tune to see exactly how many scrolls you need
  driver.execute_script('window.scrollBy(0, 400)')
  time.sleep(1)

time.sleep(10) #more time so the cards will load

two_hotels = driver.find_elements_by_xpath('//*[@id="hotel_card_list"]/div')

two_hotels now has more values

enter image description here

For i in the range 25 value I got 42 values for hotel, I think you need to tune a bit the values to get all what you need.

Eduard Florinescu
  • 16,747
  • 28
  • 113
  • 179