21

Hi I am writing a program that goes through many different URLs and just checks if they exist or not. I am basically checking if the error code returned is 404 or not. However as I am checking over 1000 URLs, I want to be able to do this very quickly. The following is my code, I was wondering how I can modify it to work quickly (if possible):

final URL url = new URL("http://www.example.com");
HttpURLConnection huc = (HttpURLConnection) url.openConnection();
int responseCode = huc.getResponseCode();

if (responseCode != 404) {
System.out.println("GOOD");
} else {
System.out.println("BAD");
}

Would it be quicker to use JSoup?

I am aware some sites give the code 200 and have their own error page, however I know the links that I am checking dont do this, so this is not needed.

M9A
  • 3,168
  • 14
  • 51
  • 79

3 Answers3

32

Try sending a "HEAD" request instead of get request. That should be faster since the response body is not downloaded.

huc.setRequestMethod("HEAD");

Again instead of checking if response status is not 400, check if it is 200. That is check for positive instead of negative. 404,403,402.. all 40x statuses are nearly equivalent to invalid non-existant url.

You may make use of multi-threading to make it even faster.

Vishnuprasad R
  • 1,682
  • 13
  • 23
  • Quick question regarding this method - Is it possible to change referrer or user agent using this way? – M9A Aug 08 '13 at 20:09
  • to set user agent huc.setRequestProperty("User-Agent", "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_5_8; en-US) AppleWebKit/532.5 (KHTML, like Gecko) Chrome/4.0.249.0 Safari/532.5"); you can set referrer too using setRequestProperty() method. – Vishnuprasad R Aug 08 '13 at 20:22
  • to set user agent: huc.setRequestProperty("User-Agent","Your user agent") – Vishnuprasad R Aug 08 '13 at 20:24
  • to set referrer setRequestProperty("Referrer", "Your referrer URL "); – Vishnuprasad R Aug 08 '13 at 20:26
1

Try to ask the next DNS Server

class DNSLookup
{
    public static void main(String args[])
    {
        String host = "stackoverflow.com";
        try
        {
            InetAddress inetAddress = InetAddress.getByName(host);
            // show the Internet Address as name/address
            System.out.println(inetAddress.getHostName() + " " + inetAddress.getHostAddress());
        }
        catch (UnknownHostException exception)
        {
            System.err.println("ERROR: Cannot access '" + host + "'");
        }
        catch (NamingException exception)
        {
            System.err.println("ERROR: No DNS record for '" + host + "'");
            exception.printStackTrace();
        }
    }
}
Khinsu
  • 1,487
  • 11
  • 27
0

Seems you can set the timeout property, make sure it is acceptable. And if you have many urls to test, do them parallelly, it will be much faster. Hope this will be helpful.

Spark8006
  • 635
  • 1
  • 7
  • 15