Recursively iterate over multiple web pages and scrape using selenium

Question

This is a follow up question to the query which I had about scraping web pages.

My earlier question: Pin down exact content location in html for web scraping urllib2 Beautiful Soup

This question is regarding doing the same, but the issue is to do the same recursively over multiple page s/views.

Here is my code

from selenium.webdriver.firefox import web driver

driver = webdriver.WebDriver()
driver.get('http://www.walmart.com/ip/29701960?page=seeAllReviews')

for review in driver.find_elements_by_class_name('BVRRReviewDisplayStyle3Main'):

    title = review.find_element_by_class_name('BVRRReviewTitle').text
    rating =review.find_element_by_xpath('.//div[@class="BVRRRatingNormalImage"]//img').get_attribute('title')
print title, rating

From the url, you'll see that no change is seen if we navigate to the second page, otherwise it wouldn't have been an issue. In this case, the next page clicker calls in a javascript from the server. Is there a way we can still scrape this using selenium in python just by some slight modification of my presented code ? Please let me know if there is.

Thanks.

barak manos · Answer 1 · 2014-04-05T16:13:45.867

Just click Next after reading each page:

from selenium.webdriver.firefox import webdriver

driver = webdriver.WebDriver()
driver.get('http://www.walmart.com/ip/29701960?page=seeAllReviews')

while True:
    for review in driver.find_elements_by_class_name('BVRRReviewDisplayStyle3Main'):
        title  = review.find_element_by_class_name('BVRRReviewTitle').text
        rating = review.find_element_by_xpath('.//div[@class="BVRRRatingNormalImage"]//img').get_attribute('title')
        print title,rating
    try:
        driver.find_element_by_link_text('Next').click()
    except:
        break

driver.quit()

Or if you want to limit the number of pages that you are reading:

from selenium.webdriver.firefox import webdriver

driver = webdriver.WebDriver()
driver.get('http://www.walmart.com/ip/29701960?page=seeAllReviews')

maxNumOfPages = 10; # for example
for pageId in range(2,maxNumOfPages+2):
    for review in driver.find_elements_by_class_name('BVRRReviewDisplayStyle3Main'):
        title  = review.find_element_by_class_name('BVRRReviewTitle').text
        rating = review.find_element_by_xpath('.//div[@class="BVRRRatingNormalImage"]//img').get_attribute('title')
        print title,rating
    try:
        driver.find_element_by_link_text(str(pageId)).click()
    except:
        break

driver.quit()

@Anuj: You're welcome :) If the idea is good, then why did you remove the green V? — barak manos, Apr 05 '14 at 16:48
because Richard's answer is also correct. And he answered first. It was either one of you people. :) — Aks, Apr 05 '14 at 21:03

score 1 · Accepted Answer · answered Apr 05 '14 at 15:55

I think this would work. Although the python might be a little off, this should give you a starting point:

continue = True
while continue:
    try:
        for review in driver.find_elements_by_class_name('BVRRReviewDisplayStyle3Main'):
            title = review.find_element_by_class_name('BVRRReviewTitle').text
            rating =review.find_element_by_xpath('.//div[@class="BVRRRatingNormalImage"]//img').get_attribute('title')
        print title, rating
        driver.find_element_by_name('BV_TrackingTag_Review_Display_NextPage').click()
    except:
        print "Done!"
        continue = False

Recursively iterate over multiple web pages and scrape using selenium

2 Answers2