-3

I am new in scraping. I am trying, to scrape data from a site using JSOUP. I want to scrape data in from tags like <div>, <span>, <p> etc. Can anybody tell me how to do this?

Pshemo
  • 122,468
  • 25
  • 185
  • 269

1 Answers1

2

Check this. A basic example:

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;

public class Test {

    public static void main(String[] args) throws Exception {
        String url = "https://stackoverflow.com/questions/2835505";
        Document document = Jsoup.connect(url).get();

        String text = document.select("div").first().text();
        System.out.println(text);

        Elements links = document.select("a");
        for (Element link : links) {
            System.out.println(link.attr("href"));
        }
    }

}

This will first print the text of the first div on the page, and then print out all the url of all links (a) on the page.


To get div's with specific class, do Elements elements = document.select("div.someclass")

To get divs with a specific id, do Elements elements = document.select("div#someclass")

If you want to go through all the selected elements, do this:

for (Element e:elements) {
   System.out.println(e.text());
   //you can also do other things.
}
Community
  • 1
  • 1
Jonas Czech
  • 12,018
  • 6
  • 44
  • 65