1

hi I'm new to web scraping and have been trying to use Selenium to scrape a forum in python

I am trying to get Selenium to click "Next" until the last page but I am not sure how to break the loop. and I having trouble with the locator:

When I locate the next button by partial link, the automated clicking will continue to next thread e.g page1->page2->next thread->page1 of next thread-->page2 of next thread

while True:
    next_link = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.PARTIAL_LINK_TEXT, "Next")))
    next_link.click()

When I locate the next button by class name, the automated clicking will click "prev" button when it reaches the last page

while True:
    next_link = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.CLASS_NAME, "prevnext")))
    next_link.click()

My questions are:

  1. Which locator should I use? (by class or by partial link or any other suggestion?
  2. How do I break the loop so it stops clicking when it reaches the last page?
Murthi
  • 5,299
  • 1
  • 10
  • 15
user9826192
  • 75
  • 3
  • 8
  • Share HTML code for pagination block. Also share HTML for Next button both for cases when last page reached and not reached – Andersson May 25 '18 at 07:08
  • You can use whatever selector you want if it works. I generally use id or xpath in such cases. Insert an If statement with a condition that can determine the last page and insert a `break;` statement there – Abhijeetk431 May 25 '18 at 07:09

3 Answers3

1
  1. You can use any locator which gives unique identification. Best practices says the following order.

    • Id
    • Name
    • Class Name
    • Css Selector
    • Xpath
    • Others
  2. The come out of the while loop when it is not find the element you can use try block as given below. the break command is used for the same.

    while True:
        try:
            next_link = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.CLASS_NAME, "prevnext")))
            next_link.click()
        except TimeoutException:
            break
    
Murthi
  • 5,299
  • 1
  • 10
  • 15
  • thank you. but by using "prevnext" it will create an endless clicking, because at the last page, there is a "Prev" button which is also located using "prevnext". this was my previous issue. – user9826192 May 28 '18 at 00:52
1

There are a couple of things you need to consider as follows :

  • There are two elements on the page with text as Next one on Top and another at the Bottom, so you need to decide with which element you desire to interact and construct a unique Locator Strategy
  • Moving forward as you want to invoke click() on the element instead of expected-conditions as presence_of_element_located() you need to use element_to_be_clickable().
  • When there would be no element with text as Next you need to execute the remaining steps, so invoke the click() within try-catch block and incase of an exception break out.
  • As per your requirement xpath as a Locator Strategy looks good to me.
  • Here is the working code block :

    from selenium import webdriver
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.support import expected_conditions as EC
    
    options = webdriver.ChromeOptions() 
    options.add_argument("start-maximized")
    options.add_argument('disable-infobars')
    driver=webdriver.Chrome(chrome_options=options, executable_path=r'C:\Utility\BrowserDrivers\chromedriver.exe')
    driver.get("https://forums.hardwarezone.com.sg/money-mind-210/hdb-fully-paid-up-5744914.html")
    driver.find_element_by_xpath("//a[@id='poststop' and @name='poststop']//following::table[1]//li[@class='prevnext']/a").click()
    while True:
        try :
            WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.XPATH, "//a[@id='poststop' and @name='poststop']//following::table[1]//li[@class='prevnext']/a[contains(.,'Next')]"))).click()
        except :
            print("No more pages left")
            break
    driver.quit()
    
  • Console Output :

    No more pages left
    
undetected Selenium
  • 183,867
  • 41
  • 278
  • 352
0

You can use below code to click Next button until the last page reached and break the loop if the button is not present:

from selenium.common.exceptions import TimeoutException

while True:
    try:
        WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.LINK_TEXT, "Next ›"))).click()
    except TimeoutException:
        break
Andersson
  • 51,635
  • 17
  • 77
  • 129
  • thank you so much! this solved my issue. previously I realised my mistake was that I was using Next › instead of Next > – user9826192 May 28 '18 at 00:47