0

I'm witting Python script which extract current page url by going to next page, and extract page url.

I can confirm that the browser is up and connecting to start page. But after that, Nothing will happen.

e.g) start page:

`https://www.jtb.co.jp/kokunai-hotel/list/kyoto/feature/couple_yado/?multiarea=26&dateunspecified=1&page=1`

URL I want extract is following 4 pages:

・https://www.jtb.co.jp/kokunai-hotel/list/kyoto/feature/couple_yado/?multiarea=26&dateunspecified=1&page=1

・https://www.jtb.co.jp/kokunai-hotel/list/kyoto/feature/couple_yado/?multiarea=26&dateunspecified=1&page=2

・https://www.jtb.co.jp/kokunai-hotel/list/kyoto/feature/couple_yado/?multiarea=26&dateunspecified=1&page=3

・https://www.jtb.co.jp/kokunai-hotel/list/kyoto/feature/couple_yado/?multiarea=26&dateunspecified=1&page=4

I wrote script as below.

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.common.exceptions import NoSuchElementException
import pandas as pd
from time import sleep
import time

 
options = Options()
driver = webdriver.Chrome('path',options=options)


pageURL = 'https://www.jtb.co.jp/kokunai-hotel/list/kyoto/feature/couple_yado/'
driver.get(pageURL)
sleep(3)


elem_urls = []


while True:
    url = driver.current_url
    
    for urls in url:
        elem_urls.append(urls)
    
    try:
        next_button = driver.find_elemenent_by_class_name('f-list-paging__next')
        next_button.click()
        sleep(3)
        
    except Exception:
        break
hafuuu
  • 23
  • 4
  • What is the problem you are facing? What's going wrong when you execute this? – M B Apr 04 '22 at 07:27
  • Thanks for your comment. when I starts this script, I can confirm that the browser is up and running. But after that, Nothing will happen. – hafuuu Apr 04 '22 at 07:57

1 Answers1

0

To extract the links for the pages you need to induce WebDriverWait for visibility_of_all_elements_located() and you can use either of the following locator strategies:

  • Using CSS_SELECTOR:

    driver.get('https://www.jtb.co.jp/kokunai-hotel/list/kyoto/feature/couple_yado/')
    print([my_elem.get_attribute("href") for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "a.f-list-paging-num__link")))])
    
  • Using XPATH:

    driver.get('https://www.jtb.co.jp/kokunai-hotel/list/kyoto/feature/couple_yado/')
    print([my_elem.get_attribute("href") for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//a[contains(@class, 'f-list-paging-num__link')]")))])
    
  • Console Output:

    ['https://www.jtb.co.jp/kokunai-hotel/list/kyoto/?page=1', 'https://www.jtb.co.jp/kokunai-hotel/list/kyoto/?page=2', 'https://www.jtb.co.jp/kokunai-hotel/list/kyoto/?page=3', 'https://www.jtb.co.jp/kokunai-hotel/list/kyoto/?page=4']
    
  • Note : You have to add the following imports :

    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    
undetected Selenium
  • 183,867
  • 41
  • 278
  • 352