5

I would like to write a code that would make Python scrape some data on a page, then click on the "next" button at the bottom of the page, scrape some data on the second page, click on the "next" button, etc. until the last page, where clicking on "Next" is no longer possible (because there is no "next").

I would like to make the code as general as possible and not specify beforehand the number of clicks to be done. Following this question (How can I make Selenium click through a variable number of "next" buttons?), I have the code below. Python does not report any error, but the program stops after the first iteration (after the first click on the "next").

What am I missing here? Many thanks!

driver = webdriver.Firefox()
driver.get("http://www.mywebsite_example.com")
try:
    wait = WebDriverWait(driver, 100)
    wait.until(EC.element_to_be_clickable((By.CLASS_NAME,'reviews_pagination_link_nav')))    
    driver.find_element_by_class_name("reviews_pagination_link_nav").click()

    wait.until(EC.element_to_be_clickable((By.CLASS_NAME, 'reviews_pagination_link_nav')))
    while EC.element_to_be_clickable((By.CLASS_NAME,'reviews_pagination_link_nav')):
      driver.find_element_by_class_name("reviews_pagination_link_nav").click()
      if not driver.find_element_by_class_name("reviews_pagination_link_nav"):
        break
      wait.until(EC.element_to_be_clickable((By.CLASS_NAME, 'reviews_pagination_link_nav')))

finally:
    driver.quit()
Community
  • 1
  • 1
anne_t
  • 435
  • 1
  • 7
  • 16

1 Answers1

2

I would make an endless while True loop and break it once there is TimeoutException thrown - this would mean there are no pages to go left:

wait = WebDriverWait(driver, 10)
while True:
    # grab the data

    # click next link
    try:
        element = wait.until(EC.element_to_be_clickable((By.CLASS_NAME, 'reviews_pagination_link_nav')))
        element.click()
    except TimeoutException:
        break

For this to work, you need to make sure that once you hit the last page, the element with class="reviews_pagination_link_nav" is not on the page or is not clickable.

alecxe
  • 462,703
  • 120
  • 1,088
  • 1,195
  • Thanks! I think it will work, and it is simpler, which I like. My problem was due to something else (duplicated class name between the next and previous buttons), so once I fix that, I should be in business. Thanks again! – anne_t Mar 03 '15 at 18:34
  • Me again... So the "next" and "previous" buttons have the same class names, etc. Their only difference is in the text in the span object. I know how to access that text: wait.until(EC.text_to_be_present_in_element((By.XPATH, "//div[@id='reviews_pagination_pair_container']"),'Next')) .... but how should I integrate this to the above code, so that Selenium runs the "element.click()" only when "next" is in the span object? (I know, basic loop question, but somehow, I cannot get it to work today) TIA – anne_t Mar 03 '15 at 21:02
  • @anne_t how about by xpath, smth like this `//*[contains(@class, "reviews_pagination_link_nav")]//span[. = "Next"]`? – alecxe Mar 03 '15 at 21:04
  • @anne_t guessing since I don't see an actual html :) – alecxe Mar 03 '15 at 21:05
  • Thanks, I will try that right now. Here is the relevant portion of the html (it is exactly the same for the "previous" button, except for the text in span: Next >> – anne_t Mar 03 '15 at 21:35
  • @anne_t good, then it should be just: `//a[@class="reviews_pagination_link_nav"]/span[starts-with(., "Next")]` – alecxe Mar 03 '15 at 21:36