Note: This code requires human intervention in between as it is incomplete, and should thus only be run with Jupyter. I am trying to get the last page number of a tripadvisor webpage.
The "Malaysia" and "Switzerland" webpages works fine (urls commented out below) but not the "Hong Kong" one.
from selenium import webdriver #for navigating through the pages
driver = webdriver.Chrome(executable_path=r'C:\\Users\\user\\Downloads\\chromedriver.exe')
url = "https://www.tripadvisor.com.sg/Hotels-g294217-Hong_Kong-Hotels.html"
#url = "https://www.tripadvisor.com.sg/Hotels-g293951-Malaysia-Hotels.html"
#url = "https://www.tripadvisor.com.sg/Hotels-g188045-Switzerland-Hotels.html"
driver.get(url)
driver.implicitly_wait(5)
Human intervention here: Now click on some arbitrary "Check in date", "Check out date" and then click "Update"
last_page_s = driver.find_element_by_css_selector("span.pageNum.last").get_attribute('data-page-number')
last_page = int(last_page_s)
print(last_page)
I'm still a newbie with webscraping so any help is greatly appreciated!!