1

trying to click the "next" button on the following page, with the ultimate goal of cycling through pages 2-8 using python + mechanize.

https://www.ncbi.nlm.nih.gov/pubmed/?term=shi+LL

I'm using the following code:

import mechanize
import cookielib
from bs4 import BeautifulSoup
import urllib


br = mechanize.Browser()

# Cookie Jar
cj = cookielib.LWPCookieJar()
br.set_cookiejar(cj)

# Browser options
br.set_handle_equiv(True)
br.set_handle_gzip(True)
br.set_handle_redirect(True)
br.set_handle_referer(True)
br.set_handle_robots(False)

# Follows refresh 0 but not hangs on refresh > 0
br.set_handle_refresh(mechanize._http.HTTPRefreshProcessor(), max_time=1)

# Want debugging messages?
#br.set_debug_http(True)
#br.set_debug_redirects(True)
#br.set_debug_responses(True)

# User-Agent (this is cheating, ok?)
br.addheaders = [('User-agent', 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.1) Gecko/2008071615 Fedora/3.0.1-1.fc9 Firefox/3.0.1')]

br.open("https://www.ncbi.nlm.nih.gov/pubmed/?term=shi+LL")

request = br.click_link(link)

response = br.follow_link(link)

print response.geturl()

But I don't know what to put in the "link" variable since the next button has an href = #, and there are multiple items on the same page with the same href...

This is html corresponding to the next button at the top of the page:

<a name="EntrezSystem2.PEntrez.PubMed.Pubmed_ResultsPanel.Entrez_Pager.Page" title="Next page of results" class="active page_link next" href="#" sid="3" page="2" accesskey="k" id="EntrezSystem2.PEntrez.PubMed.Pubmed_ResultsPanel.Entrez_Pager.Page">Next &gt;</a>

I've also tried to cycle through the pages by inserting numbers 2-8 at the text control at the top of the page with no luck considering there is no submit button/search button anywhere.

Any ideas?

user2780563
  • 11
  • 1
  • 2
  • 4
  • I think the form associated with the page is more relevent, normally [href='#'](http://stackoverflow.com/questions/4855168/what-is-href-and-why-is-it-used) just scrolls to the top of the page so there is definitely something else going on for it to go to the next page of results... – Tadhg McDonald-Jensen May 23 '16 at 00:14
  • **`a` elements with `href="#"` are not links**, you cannot follow them because they don't lead anywhere. They are used for JavaScript functions like sorting the results in the page etc, but they don't point to any page. Just **ignore them** and move on. – Marco Bonelli May 23 '16 at 00:14
  • agreed, but then how do you navigate to the next page of results (articles 21-40) if there is no href to follow? – user2780563 May 23 '16 at 01:00
  • nvm, I figured it out....Had to use selenium, doesn't seem like mechanize library has enough functionality to click the next button and selenium does – user2780563 May 23 '16 at 03:35

0 Answers0