1

EDIT: I've got a much more specific idea of what I'm looking for now so I'm re-writing the whole question.

My overall goal is to get to the search results after the first page (from within a script) on the webpage http://www.ncbi.nlm.nih.gov/images. Using the Firefox extension "Tamper Data", I have inspected the requests sent by my browser and found that I am able to modify the http POST request to get to any page of the results.

Now I would like to do this within a script. I've tried both

wget --post-data 'var1=foo&var2=bar&var3=...' http://www.ncbi.nlm.nih.gov/images

and

cURL --data 'var1=foo&var2=bar&var3=...' http://www.ncbi.nlm.nih.gov/images

and I've tried making the initial request to http://www.ncbi.nlm.nih.gov/images?term=INSERTSEARCHTERMHERE and saving a cookie, then loading the cookie the next time I request, this time with POST data indicating page number. It doesn't work. Anytime I request to the first URL I get the home page for image search or I get a page titled "Images - Error encountered" with no search results. If I request to the second URL (replacing INSERTSEARCHTERMHERE with my actual search term) I always get the first page of the results, even though I sent POST data including a variable asking for a different page. It seems there are two - maybe three? - variables denoting page number:

EntrezSystem2.PEntrez.ImagesDb.Images_ResultsPanel.Entrez_Pager.cPage=14
EntrezSystem2.PEntrez.ImagesDb.Images_ResultsPanel.Entrez_Pager.CurrPage=14

and in Tamper Data this is always the current page (the one I was on when I made the request for a new page):

EntrezSystem2.PEntrez.ImagesDb.Images_ResultsPanel.Entrez_Pager.cPage=1

(Yes, there are two variables in the POST data with the same name - I don't know what that is about...??)

So how can I use cURL or wget within a script to get to all of the pages of the search results? Thanks for your help! (and thanks to the commenters for helping me clarify the question!)

Additional info: There are a ton of POST fields, and I am sending all of them. I copied this out of what Tamper Data recorded:

EntrezSystem2.PEntrez.ImagesDb.Images_SearchBar.SearchResourceList=images&EntrezSystem2.PEntrez.ImagesDb.Images_SearchBar.Term=drug&EntrezSystem2.PEntrez.ImagesDb.Images_SearchBar.CurrDb=images&EntrezSystem2.PEntrez.ImagesDb.Entrez_PageController.PreviousPageName=results&EntrezSystem2.PEntrez.ImagesDb.Images_ResultsPanel.Images_DisplayBar.sPresentation=docsum&EntrezSystem2.PEntrez.ImagesDb.Images_ResultsPanel.Images_DisplayBar.sPageSize=20&EntrezSystem2.PEntrez.ImagesDb.Images_ResultsPanel.Images_DisplayBar.FileFormat=docsum&EntrezSystem2.PEntrez.ImagesDb.Images_ResultsPanel.Images_DisplayBar.LastPresentation=docsum&EntrezSystem2.PEntrez.ImagesDb.Images_ResultsPanel.Images_DisplayBar.Presentation=docsum&EntrezSystem2.PEntrez.ImagesDb.Images_ResultsPanel.Images_DisplayBar.PageSize=20&EntrezSystem2.PEntrez.ImagesDb.Images_ResultsPanel.Images_DisplayBar.LastPageSize=20&EntrezSystem2.PEntrez.ImagesDb.Images_ResultsPanel.Images_DisplayBar.Format=&EntrezSystem2.PEntrez.ImagesDb.Images_ResultsPanel.Images_DisplayBar.LastFormat=&EntrezSystem2.PEntrez.ImagesDb.Images_ResultsPanel.Entrez_Pager.cPage=14&EntrezSystem2.PEntrez.ImagesDb.Images_ResultsPanel.Entrez_Pager.CurrPage=14&EntrezSystem2.PEntrez.ImagesDb.Images_ResultsPanel.Entrez_ResultsController.ResultCount=38231&EntrezSystem2.PEntrez.ImagesDb.Images_ResultsPanel.Entrez_ResultsController.RunLastQuery=&EntrezSystem2.PEntrez.ImagesDb.Images_ResultsPanel.Entrez_Pager.cPage=1&EntrezSystem2.PEntrez.ImagesDb.Images_ResultsPanel.Images_DisplayBar.sPresentation2=docsum&EntrezSystem2.PEntrez.ImagesDb.Images_ResultsPanel.Images_DisplayBar.sPageSize2=20&EntrezSystem2.PEntrez.ImagesDb.Images_ResultsPanel.Entrez_MultiItemSupl.Discovery_SearchDetails.SearchDetailsTerm=drug%5BAll+Fields%5D&EntrezSystem2.PEntrez.ImagesDb.Images_ResultsPanel.HistoryDisplay.Cmd=PageChanged&EntrezSystem2.PEntrez.DbConnector.Db=images&EntrezSystem2.PEntrez.DbConnector.LastDb=images&EntrezSystem2.PEntrez.DbConnector.Term=drug&EntrezSystem2.PEntrez.DbConnector.LastTabCmd=&EntrezSystem2.PEntrez.DbConnector.LastQueryKey=1&EntrezSystem2.PEntrez.DbConnector.IdsFromResult=&EntrezSystem2.PEntrez.DbConnector.LastIdsFromResult=&EntrezSystem2.PEntrez.DbConnector.LinkName=&EntrezSystem2.PEntrez.DbConnector.LinkReadableName=&EntrezSystem2.PEntrez.DbConnector.LinkSrcDb=&EntrezSystem2.PEntrez.DbConnector.Cmd=PageChanged&EntrezSystem2.PEntrez.DbConnector.TabCmd=&EntrezSystem2.PEntrez.DbConnector.QueryKey=&p%24a=EntrezSystem2.PEntrez.ImagesDb.Images_ResultsPanel.Entrez_Pager.cPage&p%24l=EntrezSystem2&p%24st=images

daveloyall
  • 2,140
  • 21
  • 23
andy
  • 1,399
  • 3
  • 12
  • 32
  • I've had a look at the page and the "Next" button uses a link (anchor tag) that contains some special attributes, namely page="2". That then becomes page="3" as you move forward. Does that point you in the right direction? – Tomas McGuinness Apr 15 '11 at 18:59
  • Thanks. I have seen that too but I don't know how to interact with that "page=num" feature. How would I pass that into a new request? Do I need to create an http request by hand? (I've never done that before...) – andy Apr 15 '11 at 19:05
  • I'm guessing here, but it seems likely you could submit a form with the page number you would like. There is a form on the page that passes a variable called EntrezSystem2.PEntrez.ImagesDb.Images_ResultsPanel.Entrez_Pager.cPage to a URL /images. You could try emulating that POST. You should be able to find a Browser extension to help you with this - look at this questions for some pointers on the extensions http://stackoverflow.com/questions/725998/are-there-firefox-extension-or-any-other-browser-that-allow-to-send-arbitrary-p – Tomas McGuinness Apr 15 '11 at 19:10
  • 1
    You don't say what programming language you're using. Perl, python and many others have large libraries dedicated to these sorts of problems. Add a tag for your language, and the probability of useful answer will increase dramatically ;-) Good luck! – shellter Apr 17 '11 at 04:28
  • @andy in tamper data, what was the exact search you did including any other options you used? – Malcolm Jones Aug 14 '13 at 20:25

0 Answers0