Python: Getting current page url while turning next pages

Question

I'm witting Python script which extract current page url by going to next page, and extract page url.

I can confirm that the browser is up and connecting to start page. But after that, Nothing will happen.

e.g) start page:

`https://www.jtb.co.jp/kokunai-hotel/list/kyoto/feature/couple_yado/?multiarea=26&dateunspecified=1&page=1`

URL I want extract is following 4 pages:

・https://www.jtb.co.jp/kokunai-hotel/list/kyoto/feature/couple_yado/?multiarea=26&dateunspecified=1&page=1

・https://www.jtb.co.jp/kokunai-hotel/list/kyoto/feature/couple_yado/?multiarea=26&dateunspecified=1&page=2

・https://www.jtb.co.jp/kokunai-hotel/list/kyoto/feature/couple_yado/?multiarea=26&dateunspecified=1&page=3

・https://www.jtb.co.jp/kokunai-hotel/list/kyoto/feature/couple_yado/?multiarea=26&dateunspecified=1&page=4

I wrote script as below.

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.common.exceptions import NoSuchElementException
import pandas as pd
from time import sleep
import time

 
options = Options()
driver = webdriver.Chrome('path',options=options)


pageURL = 'https://www.jtb.co.jp/kokunai-hotel/list/kyoto/feature/couple_yado/'
driver.get(pageURL)
sleep(3)


elem_urls = []


while True:
    url = driver.current_url
    
    for urls in url:
        elem_urls.append(urls)
    
    try:
        next_button = driver.find_elemenent_by_class_name('f-list-paging__next')
        next_button.click()
        sleep(3)
        
    except Exception:
        break

What is the problem you are facing? What's going wrong when you execute this? — M B, Apr 04 '22 at 07:27
Thanks for your comment. when I starts this script, I can confirm that the browser is up and running. But after that, Nothing will happen. — hafuuu, Apr 04 '22 at 07:57

score 0 · Answer 1 · answered Apr 04 '22 at 08:35

To extract the links for the pages you need to induce WebDriverWait for visibility_of_all_elements_located() and you can use either of the following locator strategies:

Using CSS_SELECTOR:

driver.get('https://www.jtb.co.jp/kokunai-hotel/list/kyoto/feature/couple_yado/')
print([my_elem.get_attribute("href") for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "a.f-list-paging-num__link")))])

Using XPATH:

driver.get('https://www.jtb.co.jp/kokunai-hotel/list/kyoto/feature/couple_yado/')
print([my_elem.get_attribute("href") for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//a[contains(@class, 'f-list-paging-num__link')]")))])

Console Output:

['https://www.jtb.co.jp/kokunai-hotel/list/kyoto/?page=1', 'https://www.jtb.co.jp/kokunai-hotel/list/kyoto/?page=2', 'https://www.jtb.co.jp/kokunai-hotel/list/kyoto/?page=3', 'https://www.jtb.co.jp/kokunai-hotel/list/kyoto/?page=4']

Note : You have to add the following imports :

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

Python: Getting current page url while turning next pages

1 Answers1