0

First, I want to thank you all for taking your time to help in advance

Next, I want to point out that I already read this answer When I inspect element in google chrome on stackoverflow its really easy to understand but on the webpage listed below its kind of messy

I want to be able to load information from companies listed on this webpage http://www.manta.com/mb_51_ALL_CVZ/carlstadt_nj?pg=1

Finally, this is my code currently

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;

public class Main {
public static void main(String[]args)throws Exception{
    String url = "http://www.manta.com/mb_51_ALL_CVZ/carlstadt_nj?pg=1";
    Document doc = Jsoup.connect(url).get();

    String address = doc.select("").text();
    String telephone = doc.select("").text();
    String description = doc.select("").text();
    // want to retrieve the address, the telephone number and the description of the 
    // company listen on the website that i provided

}
}
Community
  • 1
  • 1

1 Answers1

1

First of all, use the User Agent string, so the page you get in your program will be the same one you get with your browser -

Jsoup.connect(url)
     .userAgent("Mozilla/5.0 (Windows NT 6.1; WOW64; rv:46.0) Gecko/20100101 Firefox/46.0")
     .get();

The selector for the entire table is ul.list-group:nth-child(4)
and the selctor for each row is ul.list-group:nth-child(4) > li:nth-child(X) > div:nth-child(1) where X is a number between 1 and the number of rows.
Inside each row you can find easily the selectors for address, phone and so on with your browser. For example - the address frrom the first row is given by ul.list-group:nth-child(4) > li:nth-child(1) > div:nth-child(1) > div:nth-child(1) > div:nth-child(1) > div:nth-child(3) > div:nth-child(1) > div:nth-child(1) > div:nth-child(2) > span:nth-child(1).
Just loop thru. all the rows and extract whatever you need.

TDG
  • 5,909
  • 3
  • 30
  • 51