2

I have to click on each search result one by one from this url:

Search Guidelines

I first extract the total number of results from the displayed text so that I can set the upper limit for iteration

upperlimit=driver.find_element_by_id("total_results")
number = int(upperlimit.text.split(' ')[0])

The loop is then defiend as for i in range(1,number):

However, after going through the first 10 results on the first page, list index goes out of range (probably because there are no more links to click). I need to click on "Next" to get the next 10 results, and so on till I'm done with all search results. How can I go around doing that?

Any help would be appreciated!

Seraphim's
  • 12,559
  • 20
  • 88
  • 129
user3691767
  • 115
  • 1
  • 5
  • 11

2 Answers2

2

The problem is that the value of element with id total_results changes after the page is loaded, at first it contains 117, then changes to 44.

Instead, here is a more robust approach. It processes page by page until there is no more pages left:

from selenium import webdriver
from selenium.common.exceptions import NoSuchElementException

driver = webdriver.Firefox()
url = 'http://www.nice.org.uk/Search.do?searchText=bevacizumab&newsearch=true#/search/?searchText=bevacizumab&mode=&staticTitle=false&SEARCHTYPE_all2=true&SEARCHTYPE_all1=&SEARCHTYPE=GUIDANCE&TOPICLVL0_all2=true&TOPICLVL0_all1=&HIDEFILTER=TOPICLVL1&HIDEFILTER=TOPICLVL2&TREATMENTS_all2=true&TREATMENTS_all1=&GUIDANCETYPE_all2=true&GUIDANCETYPE_all1=&STATUS_all2=true&STATUS_all1=&HIDEFILTER=EGAPREFERENCE&HIDEFILTER=TOPICLVL3&DATEFILTER_ALL=ALL&DATEFILTER_PREV=ALL&custom_date_from=&custom_date_to=11-06-2014&PAGINATIONURL=%2FSearch.do%3FsearchText%40%40bevacizumab%26newsearch%40%40true%26page%40%40&SORTORDER=BESTMATCH'
driver.get(url)

page_number = 1
while True:
    try:
        link = driver.find_element_by_link_text(str(page_number))
    except NoSuchElementException:
        break
    link.click()
    print driver.current_url
    page_number += 1

Basically, the idea here is to get the next page link, until there is no such ( NoSuchElementException would be thrown). Note that it would work for any number of pages and results.

It prints:

http://www.nice.org.uk/Search.do?searchText=bevacizumab&newsearch=true&page=1
http://www.nice.org.uk/Search.do?searchText=bevacizumab&newsearch=true&page=2#showfilter
http://www.nice.org.uk/Search.do?searchText=bevacizumab&newsearch=true&page=3#showfilter
http://www.nice.org.uk/Search.do?searchText=bevacizumab&newsearch=true&page=4#showfilter
http://www.nice.org.uk/Search.do?searchText=bevacizumab&newsearch=true&page=5#showfilter
alecxe
  • 462,703
  • 120
  • 1,088
  • 1,195
  • Oh yeah this one works just fine. But I'll need to incorporate the other functionalities in this one. Can I come back to you if I have any problems? – user3691767 Jun 11 '14 at 15:44
  • @user3691767 sure, consider creating separate SO questions if you need further help. Also, if this one is resolved, consider accepting any answers of the provided that you think deserve it. Thanks. – alecxe Jun 11 '14 at 15:48
  • You just made my day @alecxe I was about to spend all night thinking about how to tackle this problem. I am now able to traverse to each search result and get the required data. THANKS A MILLION!!!! – user3691767 Jun 11 '14 at 16:02
0

There is not even the need to programatically press on the Next button, if you see carrefully, the url just needs a new parameter when browsing other result pages:

url = "http://www.nice.org.uk/Search.do?searchText=bevacizumab&newsearch=true&page={}#showfilter"

for i in range(1,5):
    driver.get(url.format(i))

    upperlimit=driver.find_element_by_id("total_results")
    number = int(upperlimit.text.split(' ')[0])

if you still want to programatically press on the next button you could use:

driver.find_element_by_class_name('next').click()

But I haven't tested that.

PepperoniPizza
  • 8,842
  • 9
  • 58
  • 100
  • For each keyword that i search, the number of results vary. From the code that you gave, setting the range as 5 wont do the trick for other keywords right? – user3691767 Jun 11 '14 at 15:36
  • What I was able to do was put an exception like this: except IndexError: driver.find_element_by_class_name("next").click().... but this part runs only when it reaches the end of page the first time, and then it simply keeps on clicking on next, whereas I want to start the whole thing again. – user3691767 Jun 11 '14 at 15:39
  • @user3691767 I only showed an example about your question, clicking on the next button, of course handling result pages is different. alecxe shows a way of doing it. – PepperoniPizza Jun 11 '14 at 15:45