I have the code below which crawls a website using JSoup, but I want to crawl multiple URLs at the same time. I stored the URLs in an array but I can't get it to work. And how can this code be implemented in multithreading if i want to use it? Is multithreading good for such application?
public class Webcrawler {
public static void main(String[] args) throws IOException {
String [] url = {"http://www.dmoz.org/","https://docs.oracle.com/en/"};
//String [] url = new String[3];
//url[0] = "http://www.dmoz.org/";
//url[1] = "http://www.dmoz.org/Computers/Computer_Science/";
//url[2] = "https://docs.oracle.com/en/";
for(String urls : url){
System.out.print("Sites to be crawled\n " + urls);
}
//String url = "http://www.dmoz.org/";
print("\nFetching %s...", url);
Document doc = Jsoup.connect(url[0]).get();
org.jsoup.select.Elements links = doc.select("a");
//doc.select("a[href*=https]");//(This is the one you are looking for)selects if value of href contatins https
print("\nLinks: (%d)", links.size());
for (Element link : links) {
print(" (%s)", link.absUrl("href") /*link.attr("href")*/, trim(link.text(), 35));
}
}
private static void print(String msg, Object... args) {
System.out.println(String.format(msg, args));
}
private static String trim(String s, int width) {
if (s.length() > width)
return s.substring(0, width-1) + ".";
else
return s;
}
}