Each of the Google review score elements (of which there are 20) on a page such as this:
is defined in XPath thus:
//ol/div[2]/div/div/div[2]/div[%s]/div/div[3]/div/a[1]/div/div/div[2]/div/span
Using Python and WebDriver, I can extract these thus (where result
is i in range(1,21):
reviewCount = driver.find_element_by_xpath("//ol/div[2]/div/div/div[2]/div[%s]/div/div[3]/div/a[1]/div/div/div[2]/div/span" % result).text
For some reason (which I've yet to determine), whenever review count is zero, and the text is 'No Reviews', this returns a NoSuchElementException
. One would think that the above XPath would continue to work (indeed, viewing its XPath in Firebug proves that it follows the exact same structure as such an element with a non-zero score). So I am currently handling this with a try/catch:
try:
reviewCount = driver.find_element_by_xpath("//ol/div[2]/div/div/div[2]/div[%s]/div/div[3]/div/a[1]/div/div/div[2]/div/span" % result).text
reviewCount = int(reviewCount.split()[0].replace(',',''))
except NoSuchElementException, e:
reviewCount = 0
This is proving very costly from a time point of view in my program. Each time a No Reviews is encountered, the program takes several seconds to process it. In an effort to speed things up, I could try grabbing all review scores in one go by using find_elementS_by_xpath. But then, how would I account for the incrementing div index? And how would I still maintain some sort of exception-handling per element?
//ol/div[2]/div/div/div[2]/div[1]/div/div[3]/div/a[1]/div/div/div[2]/div/span
//ol/div[2]/div/div/div[2]/div[2]/div/div[3]/div/a[1]/div/div/div[2]/div/span
etc.
More fundamentally, why is Webdriver returning a NoSuchElementException for an element that is present and correct and only differs in its text value?