Today I started "to play" with JSoup
. I wanted to know how much powerful JSoup
is, so I looked for a webpage with a lot of elements and I tried to retrieve all of them. And I found what I was looking for: http://www.top1000.ie/companies.
This is a list with a lot of elements (1000) that are similar (each company of the list). Just change the text inside of them so what I have tried to retrieve it is that text, but I am only able to get the first 20 elements, not the rest.
This is my simple code:
package retrieveInfo;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
public class Retrieve {
public static void main(String[] args) throws Exception{
String url = "http://www.top1000.ie/companies";
Document document = Jsoup.connect(url)
.userAgent("Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:25.0) Gecko/20100101 Firefox/25.0")
.timeout(1000*5)
.get();
Elements companies = document.body().select(".content .name");
for (Element company : companies) {
System.out.println("Company: " + company.text());
}
}
}
I though that it could be that the page did not have time to load, so it is the reason why I put .timeout(1000*5)
to wait 5 seconds but I only can get the first 20 elements of the list.
Does JSoup
have a limit of elements that you can retrieve from a webpage? I think it should not because it seems that it is prepared for that purpose so I think I am missing something in my code.
Any help would be appreciated. Thanks in advance!