118

I wanted to know if there is any standard APIs in Java to validate a given URL? I want to check both if the URL string is right i.e. the given protocol is valid and then to check if a connection can be established.

I tried using HttpURLConnection, providing the URL and connecting to it. The first part of my requirement seems to be fulfilled but when I try to perform HttpURLConnection.connect(), 'java.net.ConnectException: Connection refused' exception is thrown.

Can this be because of proxy settings? I tried setting the System properties for proxy but no success.

Let me know what I am doing wrong.

Matthew Murdoch
  • 30,874
  • 30
  • 96
  • 127
Keya
  • 1,187
  • 2
  • 8
  • 5
  • 2
    There seem to be 2 questions here; URL validation and finding the cause of a ConnectException – Ben James Oct 21 '09 at 12:19
  • Since this is the first google hit for `java url validator`, there are indeed to questions here, how to validate the url (from looking at the string) and how to check if the url is reachable (via an http connection, for example). – vikingsteve Dec 14 '16 at 08:54

11 Answers11

174

For the benefit of the community, since this thread is top on Google when searching for
"url validator java"


Catching exceptions is expensive, and should be avoided when possible. If you just want to verify your String is a valid URL, you can use the UrlValidator class from the Apache Commons Validator project.

For example:

String[] schemes = {"http","https"}; // DEFAULT schemes = "http", "https", "ftp"
UrlValidator urlValidator = new UrlValidator(schemes);
if (urlValidator.isValid("ftp://foo.bar.com/")) {
   System.out.println("URL is valid");
} else {
   System.out.println("URL is invalid");
}
informatik01
  • 16,038
  • 10
  • 74
  • 104
Yonatan
  • 2,543
  • 2
  • 19
  • 20
  • 41
    That URLValidator class is marked deprecated. The recommended URLValidator is in the routines package: http://commons.apache.org/validator/apidocs/org/apache/commons/validator/routines/UrlValidator.html – Spektr Mar 23 '11 at 16:46
  • is there a similar library for validating email addresses as well? – Conrad.Dean Mar 26 '12 at 19:12
  • 24
    I fail to see how this is **standard API** – arkon Jul 27 '13 at 05:14
  • 5
    UrlValidator has its own set of known issues. Is there an alternate library that is being maintained more actively? – Alex Averbuch Aug 13 '13 at 13:52
  • 11
    @AlexAverbuch: can you please outline what the issues are with UrlValidator? It's not very helpful to just say they exist but not say what they are. – cdmckay May 13 '15 at 19:56
  • try domains such as something.london something.anothercity that are out now – V H Jun 09 '15 at 14:37
  • 1
    @AlexAverbuch: It seems that Commons Validator issues are getting fixed in a rather timely manner: https://issues.apache.org/jira/browse/VALIDATOR-283?jql=project%20%3D%20VALIDATOR%20AND%20status%20in%20%28Open%2C%20%22In%20Progress%22%2C%20Reopened%29 If you find any other issue, please report it, thanks! – Nicolas Raoul Nov 11 '15 at 07:13
  • 2
    We use security scanning software to identify security vulnerabilities in third party libraries, and unfortunately commons-validator contains commons-beanutils which is identified red (security vulnerability). Is there another (slimmer) library / API ? – vikingsteve Dec 14 '16 at 08:52
  • @vikingsteve the org.apache.commons.validator.routines.UrlValidator doesn't uses the beanutils (at least in latest 1.5.1 version). Perhaps you can just exclude the beanutils dependency? – Yonatan Dec 15 '16 at 15:37
  • 2
    A question about the commons library: Why aren't these functions simple static functions? Why do I need to create a UrlValidator object to validate 1 URL? What utility do they get out having that "state"? – Parth Mehrotra Aug 08 '17 at 15:32
  • 2
    @ParthMehrotra I'm 4 years late, but the main reason for this is that you can mock the validation in tests, and also you can register the validator as a bean and configure it only once. – Lakatos Gyula Oct 03 '21 at 12:28
40

The java.net.URL class is in fact not at all a good way of validating URLs. MalformedURLException is not thrown on all malformed URLs during construction. Catching IOException on java.net.URL#openConnection().connect() does not validate URL either, only tell wether or not the connection can be established.

Consider this piece of code:

    try {
        new URL("http://.com");
        new URL("http://com.");
        new URL("http:// ");
        new URL("ftp://::::@example.com");
    } catch (MalformedURLException malformedURLException) {
        malformedURLException.printStackTrace();
    }

..which does not throw any exceptions.

I recommend using some validation API implemented using a context free grammar, or in very simplified validation just use regular expressions. However I need someone to suggest a superior or standard API for this, I only recently started searching for it myself.

Note It has been suggested that URL#toURI() in combination with handling of the exception java.net. URISyntaxException can facilitate validation of URLs. However, this method only catches one of the very simple cases above.

The conclusion is that there is no standard java URL parser to validate URLs.

Martin
  • 2,347
  • 1
  • 21
  • 21
  • Have you found a solution to this problem?? – kidd0 Mar 29 '14 at 05:36
  • @bi0s.kidd0 There are several libraries that can be used, but we decided to roll our own. It's not complete, but can parse what we are interested in, including URLs containing either domains or IPs (both v4 and v6). https://github.com/jajja/arachne – Martin Apr 04 '14 at 10:35
34

You need to create both a URL object and a URLConnection object. The following code will test both the format of the URL and whether a connection can be established:

try {
    URL url = new URL("http://www.yoursite.com/");
    URLConnection conn = url.openConnection();
    conn.connect();
} catch (MalformedURLException e) {
    // the URL is not in a valid form
} catch (IOException e) {
    // the connection couldn't be established
}
Olly
  • 7,732
  • 10
  • 54
  • 63
  • Note there are multiple ways of checking for malformed urls / problems. For example, if you will be using your url for a `new HttpGet(url)`, then you can catch the `IllegalArgumentException` `HttpGet(...)` throws if there's a malformed url. And `HttpResponse` will throws stuff at you too if there's a problem with getting the data. – Peter Ajtai Nov 15 '11 at 17:14
  • 2
    Connection validates only host availability. Has nothing to do with validness of URL. – dernasherbrezon Nov 02 '12 at 07:07
  • 2
    MalformedURLException is not a safe strategy to test the valid form of a URL. This answer is misleading. – Martin Feb 01 '13 at 13:58
  • 1
    @Martin: can you elaborate *why* it isn't safe? – Jeroen Vannevel Jan 24 '14 at 01:44
  • @JeroenVannevel I already have, in an answer to the OP question. The fact that the constructor throws MalformedURLException does not mean that the format is validated. – Martin Jan 26 '14 at 21:04
  • @Martin: sorry, missed that post! – Jeroen Vannevel Jan 26 '14 at 21:08
  • 39
    This is very, very expensive. openConnection/connect will actually try to connect to the http resource. This must be one of the most expensive ways I have ever seen to verify an URL. – Glenn Bech Jan 30 '14 at 12:37
  • Moreover for any Android Developers coming along, the solution should be used on Background thread (or AsyncTask) otherwise you will get the exception *android.os.NetworkOnMainThreadException* – A.B. Feb 10 '18 at 15:53
23

Using only standard API, pass the string to a URL object then convert it to a URI object. This will accurately determine the validity of the URL according to the RFC2396 standard.

Example:

public boolean isValidURL(String url) {

    try {
        new URL(url).toURI();
    } catch (MalformedURLException | URISyntaxException e) {
        return false;
    }

    return true;
}
proski
  • 3,603
  • 27
  • 27
arkon
  • 2,209
  • 3
  • 27
  • 36
  • 13
    Note that this string->url->uri validation scheme reports that these test cases are valid: "http://.com" "http://com." "ftp://::::@example.com" "http:/test.com" "http:test.com" "http:/:" So while this is standard API, the validation rules it applies may not be what one expects. – DaveK Oct 28 '13 at 17:34
10

There is a way to perform URL validation in strict accordance to standards in Java without resorting to third-party libraries:

boolean isValidURL(String url) {
  try {
    new URI(url).parseServerAuthority();
    return true;
  } catch (URISyntaxException e) {
    return false;
  }
}

The constructor of URI checks that url is a valid URI, and the call to parseServerAuthority ensures that it is a URL (absolute or relative) and not a URN.

dened
  • 4,253
  • 18
  • 34
  • The exception is thrown "If the authority component of this URI is defined but cannot be parsed as a server-based authority according to RFC 2396". While this is much better than most other proposals, it cannot validate a URL. – Martin Apr 16 '19 at 10:52
  • @Martin, You forgot about the validation in the constructor. As I wrote, the combination of the `URI` constructor call and the `parseServerAuthority` call validates the URL, not `parseServerAuthority` alone. – dened Apr 18 '19 at 07:51
  • 1
    You can find examples on this page that are incorrectly validated by your suggestion. Refer to documentation, and if it's not designed for your intended use, please don't promote to exploit it. – Martin May 08 '19 at 09:26
  • @Martin, Can you be more specific? Which examples in your opinion are incorrectly validated by this method? – dened May 09 '19 at 12:19
  • @Asu And this is a valid URL according to RFC 2396! `https` is both the schema and the host there. – dened Nov 23 '19 at 14:47
  • @dened With two of "://"? – Asu Nov 24 '19 at 00:08
  • 1
    @Asu yes. The second `://` comes after the host, `:` introduces the port number, which can be empty according to the syntax. `//` is a part of the path with an empty segment, which is also valid. If you enter this address in your browser it will try to open it (but most probably won't find the server named `https` ;)). – dened Nov 24 '19 at 15:19
  • Sigh.. Good point. Completely counter-intuitive that they let the port be empty after the colon. – Asu Nov 25 '19 at 02:16
9

Use the android.webkit.URLUtil on android:

URLUtil.isValidUrl(URL_STRING);

Note: It is just checking the initial scheme of URL, not that the entire URL is valid.

penduDev
  • 4,743
  • 35
  • 37
2

Just important to point that the URL object handle both validation and connection. Then, only protocols for which a handler has been provided in sun.net.www.protocol are authorized (file, ftp, gopher, http, https, jar, mailto, netdoc) are valid ones. For instance, try to make a new URL with the ldap protocol:

new URL("ldap://myhost:389")

You will get a java.net.MalformedURLException: unknown protocol: ldap.

You need to implement your own handler and register it through URL.setURLStreamHandlerFactory(). Quite overkill if you just want to validate the URL syntax, a regexp seems to be a simpler solution.

Doc Davluz
  • 4,154
  • 5
  • 30
  • 32
1

Are you sure you're using the correct proxy as system properties?

Also if you are using 1.5 or 1.6 you could pass a java.net.Proxy instance to the openConnection() method. This is more elegant imo:

//Proxy instance, proxy ip = 10.0.0.1 with port 8080
Proxy proxy = new Proxy(Proxy.Type.HTTP, new InetSocketAddress("10.0.0.1", 8080));
conn = new URL(urlString).openConnection(proxy);
NickDK
  • 5,159
  • 2
  • 18
  • 11
  • Why would this be elegant or even correct? It uses expensive resources when it works, and it does not work for a correct URL is not available for connection when tested. – Martin Sep 14 '18 at 15:08
0

I think the best response is from the user @b1nary.atr0phy. Somehow, I recommend combine the method from the b1nay.atr0phy response with a regex to cover all the possible cases.

public static final URL validateURL(String url, Logger logger) {

        URL u = null;
        try {  
            Pattern regex = Pattern.compile("(?i)^(?:(?:https?|ftp)://)(?:\\S+(?::\\S*)?@)?(?:(?!(?:10|127)(?:\\.\\d{1,3}){3})(?!(?:169\\.254|192\\.168)(?:\\.\\d{1,3}){2})(?!172\\.(?:1[6-9]|2\\d|3[0-1])(?:\\.\\d{1,3}){2})(?:[1-9]\\d?|1\\d\\d|2[01]\\d|22[0-3])(?:\\.(?:1?\\d{1,2}|2[0-4]\\d|25[0-5])){2}(?:\\.(?:[1-9]\\d?|1\\d\\d|2[0-4]\\d|25[0-4]))|(?:(?:[a-z\\u00a1-\\uffff0-9]-*)*[a-z\\u00a1-\\uffff0-9]+)(?:\\.(?:[a-z\\u00a1-\\uffff0-9]-*)*[a-z\\u00a1-\\uffff0-9]+)*(?:\\.(?:[a-z\\u00a1-\\uffff]{2,}))\\.?)(?::\\d{2,5})?(?:[/?#]\\S*)?$");
            Matcher matcher = regex.matcher(url);
            if(!matcher.find()) {
                throw new URISyntaxException(url, "La url no está formada correctamente.");
            }
            u = new URL(url);  
            u.toURI(); 
        } catch (MalformedURLException e) {  
            logger.error("La url no está formada correctamente.");
        } catch (URISyntaxException e) {  
            logger.error("La url no está formada correctamente.");  
        }  

        return u;  

    }
Genaut
  • 1,810
  • 2
  • 29
  • 60
  • 2
    There are a couple of problems with this regex: 1. URLs without the prefix are invalid, (e.g. "stackoverflow.com"), this also includes URLs with two suffixes if they're missing the prefix (e.g. "amazon.co.uk"). 2. IPs are always invalid (e.g. "ftp://127.0.0.1"), no matter if they use the prefix or not. I'd suggest using `"((http|https|ftp)://)?((\\w)*|([0-9]*)|([-|_])*)+([\\.|/]((\\w)*|([0-9]*)|([-|_])*))+"` ([source](https://stackoverflow.com/a/57219660/2016165)). The only downside to this regex is that e.g. "127.0..0.1" and "127.0" are valid. – Neph Mar 17 '20 at 14:23
0

This is what I use to validate CDN urls (must start with https, but that's easy to customise). This will also not allow using IP addresses.

public static final boolean validateURL(String url) {  
    var regex = Pattern.compile("^[https:\\/\\/(www\\.)?a-zA-Z0-9@:%._\\+~#=]{2,256}\\.[a-z]{2,6}\\b([-a-zA-Z0-9@:%_\\+.~#?&//=]*)");
    var matcher = regex.matcher(url);
    return matcher.find();
}
giacomello
  • 21
  • 1
  • 8
-2

Thanks. Opening the URL connection by passing the Proxy as suggested by NickDK works fine.

//Proxy instance, proxy ip = 10.0.0.1 with port 8080
Proxy proxy = new Proxy(Proxy.Type.HTTP, new InetSocketAddress("10.0.0.1", 8080));
conn = new URL(urlString).openConnection(proxy);

System properties however doesn't work as I had mentioned earlier.

Thanks again.

Regards, Keya

StarsSky
  • 6,721
  • 6
  • 38
  • 63
Keya
  • 1,187
  • 2
  • 8
  • 5