How to extract links from Google HTML result page?

Question

I am reading a text file that contains HTML code from Google search results. Then I parse it and I try to extract the links with this code:

FileReader in = new FileReader("A.txt");
BufferedReader p = new BufferedReader(in);
while(p.readLine() != null)
{
  String html = p.readLine();
  Document doc = Jsoup.parse(html);
  Elements Link = doc.select("a[href");
  for(Element element :Link)
  {   
    if(element != null)
    {
       System.out.println(element);
    }
  }
}

But I got many non-link strings. How can I show the links, not anything else?

According to this [question][1] what your asking is against google TOS [1]: http://stackoverflow.com/questions/3727662/how-can-you-search-google-programmatically-java-api — farrellmr, Jan 06 '14 at 09:34
can you post the html code you're trying to parse? Because google search result page does not contain your results as direct HTML anchors — StoopidDonut, Jan 06 '14 at 09:45
exactly i have this problem,the google search result HTML is difrrent — user3132730, Jan 06 '14 at 13:47

score 0 · Answer 1 · answered Jan 06 '14 at 10:57

0

Please try again with a complete selector, not only "a[href":

Elements links = doc.select("a[href]"); // a with href

See the Selector document for the full support - especially the examples on the right side.

answered Jan 06 '14 at 10:57

Jens A. Koch

39,862
13
113
141

If the links are in an iframe, you need to select that first. doc.select("iframe") will help you. – Jens A. Koch Jan 09 '14 at 21:50

How to extract links from Google HTML result page?

1 Answers1