How to scroll correctly in a dynamically-loading webpage with Selenium?

Question

Here's the link of the website : website

I would like to have all the links of th hotels in this location.

Here's my script :

import pandas as pd
import numpy as np
from selenium import webdriver
import time

PATH = "driver\chromedriver.exe"

options = webdriver.ChromeOptions() 
options.add_argument("--disable-gpu")
options.add_argument("--window-size=1200,900")
options.add_argument('enable-logging')


driver = webdriver.Chrome(options=options, executable_path=PATH)

driver.get('https://fr.hotels.com/search.do?destination-id=10398359&q-check-in=2021-06-24&q-check-out=2021-06-25&q-rooms=1&q-room-0-adults=2&q-room-0-children=0&sort-order=BEST_SELLER')

cookie = driver.find_element_by_xpath('//button[@class="uolsaJ"]')
try:
    cookie.click()
except:
    pass

for i in range(30):
    driver.execute_script("window.scrollBy(0, 1000)")
    time.sleep(5)

time.sleep(5)

my_elems = driver.find_elements_by_xpath('//a[@class="_61P-R0"]')

links = [my_elem.get_attribute("href") for my_elem in my_elems]


X = np.array(links)
print(X.shape)
#driver.close()

But I cannot find a way to tell the script : scroll down until there is nothing more to scroll.

I tried to change this parameters :

for i in range(30):
    driver.execute_script("window.scrollBy(0, 1000)")
    time.sleep(30)

I changed the time.sleep(), the number 1000 and so on but my output keep changing and not in the right way.

output

As you can see, I have scraped a lot of numbers differents. How to make my script scraping a same amout each time ? Not necessarily each links but at last a stable number.

Here it scroll and at one point it seems blocked and scrape all the links it has at the moment. That's not appropriate.

Prophet · Accepted Answer · 2021-06-25T08:40:21.643

2

There are several issues here.

You are getting the elements and their links only AFTER you finished scrolling while you should do that inside the scrolling loop.
You should wait until the cookies alert is appearing to close it.
You can scroll until the footer element is presented.
Something like this:

import pandas as pd
import numpy as np
from selenium import webdriver
import time
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

PATH = "driver\chromedriver.exe"

options = webdriver.ChromeOptions() 
options.add_argument("--disable-gpu")
options.add_argument("--window-size=1200,900")
options.add_argument('enable-logging')


driver = webdriver.Chrome(options=options, executable_path=PATH)
wait = WebDriverWait(driver, 20)

driver.get('https://fr.hotels.com/search.do?destination-id=10398359&q-check-in=2021-06-24&q-check-out=2021-06-25&q-rooms=1&q-room-0-adults=2&q-room-0-children=0&sort-order=BEST_SELLER')

wait.until(EC.visibility_of_element_located((By.XPATH, '//button[@class="uolsaJ"]'))).click()

def is_element_visible(xpath):
    wait1 = WebDriverWait(driver, 2)
    try:
        wait1.until(EC.visibility_of_element_located((By.XPATH, xpath)))
        return True
    except Exception:
        return False

while not is_element_visible("//footer[@id='footer']"):
    my_elems = driver.find_elements_by_xpath('//a[@class="_61P-R0"]')

    links = [my_elem.get_attribute("href") for my_elem in my_elems]

    X = np.array(links)
    print(X.shape)

    driver.execute_script("window.scrollBy(0, 1000)")
    time.sleep(5)


#driver.close()

edited Jun 25 '21 at 08:40

answered Jun 25 '21 at 08:16

Prophet

32,350
22
54
79

Thanks Prophet, always here to help :) I will check your code asap – RandallCloud Jun 25 '21 at 08:22
I have updated the answer since I think it was a problem there. Now it should be better. One day I will learn Python :) – Prophet Jun 25 '21 at 08:41
`while not find_elements_by_xpath("//footer[@id='footer']"): NameError: name 'find_elements_by_xpath' is not defined` – RandallCloud Jun 25 '21 at 11:34
@RandallCloud I fixed that more than 3 hours ago... See the updated answer – Prophet Jun 25 '21 at 11:43
It seems that doesn't do anything.. The page doesn't scroll and the script end with nothing – RandallCloud Jun 25 '21 at 11:50
If so you need to use some other XPath locator there. I used that since I thought that element will not be visible until you scroll to the end of list. From my location your page doesn't show any search results at all... – Prophet Jun 25 '21 at 11:53
It should be an element that is not visible until you scroll to the bottom. – Prophet Jun 25 '21 at 11:54
I think you had no search result it's because the date is expired. – RandallCloud Jun 25 '21 at 12:05
Right! But now I see that the footer in not visible during the scrolling ... – Prophet Jun 25 '21 at 12:09
I put : `while not is_element_visible("//div[@id='105']"):` Each hotel had is own id so I figure it out that was the way to specify my own number. For now, the script is running. – RandallCloud Jun 25 '21 at 12:14
So, I hope now you can accept my answer? – Prophet Jun 25 '21 at 13:12
Yeah of course ! Thanks a lot Prophet, that's not the first time you're helping me :) – RandallCloud Jun 25 '21 at 14:00

Dmitriy Zub · Answer 2 · 2021-07-07T06:18:15.577

1

You can try this by directly calling the DOM and locate some element that will be only at the bottom of the page with .is_displayed() selenium method which returns true/false:

# https://stackoverflow.com/a/57076690/15164646
while True:
  # it will be returning false until the element is located
  # "#message" id = "No more results" at the bottom of the YouTube search
  end_result = driver.find_element_by_css_selector('#message').is_displayed() 
  driver.execute_script("var scrollingElement = (document.scrollingElement || document.body);scrollingElement.scrollTop = scrollingElement.scrollHeight;")

  # further code below
  
  # once the element is found it returns True. If so, it will break out of the while loop
  if end_result == True:
    break

I wrote a blog post where I used this method to scrape YouTube Search.

edited Jul 07 '21 at 06:18

answered Jun 25 '21 at 10:21

Dmitriy Zub

1,398
8
35

The script seems tu run endlessly ? – RandallCloud Jun 25 '21 at 11:50
Indeed! Thank you for letting me know! As soon as it be fixed I'll add another comment here so you know. – Dmitriy Zub Jun 25 '21 at 16:32
Hey @RandallCloud! I updated the answer. Now it `break` out of a `while` loop when the element at the bottom of the page is located. – Dmitriy Zub Jul 07 '21 at 06:19

How to scroll correctly in a dynamically-loading webpage with Selenium?

2 Answers2