0

I am reading a text file that contains HTML code from Google search results. Then I parse it and I try to extract the links with this code:

FileReader in = new FileReader("A.txt");
BufferedReader p = new BufferedReader(in);
while(p.readLine() != null)
{
  String html = p.readLine();
  Document doc = Jsoup.parse(html);
  Elements Link = doc.select("a[href");
  for(Element element :Link)
  {   
    if(element != null)
    {
       System.out.println(element);
    }
  }
}

But I got many non-link strings. How can I show the links, not anything else?

joragupra
  • 692
  • 1
  • 12
  • 23
user3132730
  • 31
  • 1
  • 7
  • 1
    According to this [question][1] what your asking is against google TOS [1]: http://stackoverflow.com/questions/3727662/how-can-you-search-google-programmatically-java-api – farrellmr Jan 06 '14 at 09:34
  • can you post the html code you're trying to parse? Because google search result page does not contain your results as direct HTML anchors – StoopidDonut Jan 06 '14 at 09:45
  • exactly i have this problem,the google search result HTML is difrrent – user3132730 Jan 06 '14 at 13:47

1 Answers1

0

Please try again with a complete selector, not only "a[href":

Elements links = doc.select("a[href]"); // a with href

See the Selector document for the full support - especially the examples on the right side.

Jens A. Koch
  • 39,862
  • 13
  • 113
  • 141