I am attempting to create simple crawler in Jsoup. It finds all links in website's source code and eventually follows them, again searching for new links in each of them and so on.
Problem is that computational time is quite long after getting over two redirects deep.
This is the pseudocode of how it works:
function follow_links(String[] links)
{
for(int i=0; i<=links.amount-1; i++)
{
Document doc = Jsoup.connect(links[i]);
String[] newlinks = new String[max];
newlinks = parse(doc);
...
}
}
My question is, whether the code would be faster if I created new Thread in each iteration of loop, so all connections would be established in parallel. It takes some time for connect function to return so I suppose queue is formed. Can threading solve such issue?