1

I'm using HttpURLConnection to validate URLs coming out of a database. Sometimes with certain URLs I will get an exception, I assume they are timing out but are in fact reachable (no 400 range error).

Increasing the timeout doesn't seem to matter, I still get an exception. Is there a second check I could do in the catch region to verify if in fact the URL is bad? The relevant code is below. It works with 99.9% of URLs, it's that .01%.

try {
    HttpURLConnection connection = (HttpURLConnection) new URL(url).openConnection();
    connection.setConnectTimeout(timeout);
    connection.setReadTimeout(timeout);
    connection.setRequestMethod("GET");
    connection.setRequestProperty("User-Agent",
            "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.0.13) Gecko/2009073021 Firefox/3.0.13");
    connection.connect () ; 
    int responseCode = connection.getResponseCode();
    if (responseCode >= 401) 
    {
        String prcMessage = "ERROR: URL " + url + " not found, response code was " + responseCode + "\r";
        System.out.println(prcMessage);
        VerifyUrl.writeToFile(prcMessage);
        return (false);
    }
}
catch (IOException exception) 
{
    String errorMessage =  ("ERROR: URL " + url + " did not load in the given time of " + timeout + " milliseconds.");
    System.out.println(errorMessage);
    VerifyUrl.writeToFile(errorMessage);
    return false;
}
Draken
  • 3,134
  • 13
  • 34
  • 54
Sulteric
  • 505
  • 6
  • 16
  • You could use a regex. https://docs.oracle.com/javase/tutorial/essential/regex/ – aleb2000 Dec 15 '16 at 15:03
  • You could use apache commons UrlValidator. https://commons.apache.org/proper/commons-validator/apidocs/org/apache/commons/validator/routines/UrlValidator.html – ntalbs Dec 15 '16 at 15:05

1 Answers1

2

Depends on what you want to check. But i guess Validating URL in Java got you covered.

You got two possiblities:

  1. Check syntax ("Is this URL a real URL or just made up?")

    There is a large amount of text which describes how to do it. Basically search for RFC 3986. I guess someone has implemented a check like this already.

  2. Check the semantics ("Is the URL available?")

    There is not really a faster way to do that though there are different tools available for sending a http request in java. You may send a HEAD request instead of GET as HEAD omits the HTTP body and may result in faster requests and less timeouts.

Community
  • 1
  • 1
getjackx
  • 345
  • 1
  • 10
  • The exception looks to be caused by one url causing "too many redirects". I set setInstanceFollowRedirects and setFollowRedirects to true and it is still causing the exception. Not sure how to work around it. – Sulteric Dec 15 '16 at 15:40
  • Could also be the servers fault for implementing a bad redirection. Usually there is a Location header which can be followed. I don't know how you do the redirection. The easiest option would be to throw away/close the old connection and follow the new location. – getjackx Dec 15 '16 at 15:45