I'm trying to find the most efficient way to test 300,000+ URLs in a database to basically check if the URLs are still valid. Having looked around the site I've found many excellent answers and am now using something along the lines of:
Read URL from file.... Test URL:
final URL url = new URL("http://" + address);
final HttpURLConnection urlConn = (HttpURLConnection) url.openConnection();
urlConn.setConnectTimeout(1000 * 10);
urlConn.connect();
urlConn.getResponseCode(); // Do something with the code
urlConn.disconnect();
Write details back to file....
So a couple of questions: 1) Is there a more efficient way to test URLs and get response codes?
2) Initially I am able to test about 50 URLs per minute, but after 5 or so minutes things really slow down - I imagine there is some resources I'm not releasing but am not sure what
3) Certain URLs (e.g. www.bhs.org.au) will cause the above to hang for minutes (not good when I have so many URLs to test) even with the connect timeout set, is there anyway I can tighten this up?
Thanks in advance for any help, it's been a quite a few years since I've written any code and I'm starting again from scratch :-)