trying to click the "next" button on the following page, with the ultimate goal of cycling through pages 2-8 using python + mechanize.
https://www.ncbi.nlm.nih.gov/pubmed/?term=shi+LL
I'm using the following code:
import mechanize
import cookielib
from bs4 import BeautifulSoup
import urllib
br = mechanize.Browser()
# Cookie Jar
cj = cookielib.LWPCookieJar()
br.set_cookiejar(cj)
# Browser options
br.set_handle_equiv(True)
br.set_handle_gzip(True)
br.set_handle_redirect(True)
br.set_handle_referer(True)
br.set_handle_robots(False)
# Follows refresh 0 but not hangs on refresh > 0
br.set_handle_refresh(mechanize._http.HTTPRefreshProcessor(), max_time=1)
# Want debugging messages?
#br.set_debug_http(True)
#br.set_debug_redirects(True)
#br.set_debug_responses(True)
# User-Agent (this is cheating, ok?)
br.addheaders = [('User-agent', 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.1) Gecko/2008071615 Fedora/3.0.1-1.fc9 Firefox/3.0.1')]
br.open("https://www.ncbi.nlm.nih.gov/pubmed/?term=shi+LL")
request = br.click_link(link)
response = br.follow_link(link)
print response.geturl()
But I don't know what to put in the "link" variable since the next button has an href = #, and there are multiple items on the same page with the same href...
This is html corresponding to the next button at the top of the page:
<a name="EntrezSystem2.PEntrez.PubMed.Pubmed_ResultsPanel.Entrez_Pager.Page" title="Next page of results" class="active page_link next" href="#" sid="3" page="2" accesskey="k" id="EntrezSystem2.PEntrez.PubMed.Pubmed_ResultsPanel.Entrez_Pager.Page">Next ></a>
I've also tried to cycle through the pages by inserting numbers 2-8 at the text control at the top of the page with no luck considering there is no submit button/search button anywhere.
Any ideas?