1
import com.jaunt.*;
public class JauntCrawler{
  public static void main(String[] args){
    try{
        UserAgent userAgent = new UserAgent();         //create new userAgent (headless browser)
        userAgent.visit("http://google.de");          //visit google
        userAgent.doc.apply("schmetterlinge");            //apply form input (starting at first editable field)
        userAgent.doc.submit();         //click submit button labelled "Google Search"


        Elements links = userAgent.doc.findEvery("<h3 class=r>").findEvery("<a>");  //find search result links
        for(Element link : links) System.out.println(link.getAt("href"));           //print results

        if(userAgent.doc.nextPageLinkExists()) {
            userAgent.visit(userAgent.doc.nextPageLink().getHref());
            Elements newlinks = userAgent.doc.findEvery("<h3 class=r>").findEvery("<a>");
            System.out.println("\nPage 2:");
            for(Element link : newlinks) System.out.println(link.getAt("href"));
        }
    }
    catch(JauntException e){         //if an HTTP/connection error occurs, handle JauntException.
      System.err.println(e);
    }
  }
}

I want to return more search results from Google than just the first page. So the second for-loop should basically return the results of the next page, but it doesn't. Any idea why?

Pete
  • 502
  • 1
  • 6
  • 20
  • Since you're scraping a German Google page, it doesn't contain the text "Page 2" Run the query in a browser, and look at the first page source code to find the German page 2. Good luck, as the page is mostly JavaScript calls. – Gilbert Le Blanc Jul 17 '15 at 17:51
  • I'm only outputting this on console ;) i don't think it's a language problem – Pete Jul 19 '15 at 08:31

1 Answers1

1

I also came across the same problem. user agent is not going to the next page but I found another way to achieve this :

Elements nextLinks = userAgent.doc.findEvery("<a class=fl");
        for(int i=0;i<nextLinks.size();i++) {
            userAgent.visit("http://google.co.in/search?q="+<search_string+"&start="+(i+1)*10);
            links = userAgent.doc.findEvery("<h3 class=r>").findEvery("<a>"); 
            for(Element link : links) System.out.println(link.getAt("href"));
        }
Zeeshan Amber
  • 152
  • 1
  • 7