115

What is the best way to check if a URL is valid in Java?

If tried to call new URL(urlString) and catch a MalformedURLException, but it seems to be happy with anything that begins with http://.

I'm not concerned about establishing a connection, just validity. Is there a method for this? An annotation in Hibernate Validator? Should I use a regex?

Edit: Some examples of accepted URLs are http://*** and http://my favorite site!.

Nikhil
  • 16,194
  • 20
  • 64
  • 81
Eric Wilson
  • 57,719
  • 77
  • 200
  • 270
  • 1
    How do you define validity if you're not going to establish a connection? – Michael Myers Feb 09 '10 at 16:32
  • 2
    Can you give an example of something which isn't a valid URL that the `URL` constructor accepts? – uckelman Feb 09 '10 at 16:33
  • 2
    @mmyers: Validity should be determined by RFCs 2396 and 2732, the ones which define what a URL is. – uckelman Feb 09 '10 at 16:34
  • What do you mean with *anything that begins with http://* ? Is `http://spaces allowed` valid? For instance `http://mylocalmachine` is still valid. – OscarRyz Feb 09 '10 at 16:35
  • 4
    @uckelman: Just about anything. "`http://***`" works. "`http://my favorite site!`" works. I can't get it to throw an exception (when http:// is at the beginning.) – Eric Wilson Feb 09 '10 at 16:37
  • This throws an exception: `http://example.com:80#foo/bar`, even though the URL is perfectly valid. – user123444555621 Dec 19 '12 at 18:26
  • 2
    possible duplicate of [Validating URL in Java](http://stackoverflow.com/questions/1600291/validating-url-in-java) – JasonB Jul 11 '13 at 21:41

9 Answers9

113

Consider using the Apache Commons UrlValidator class

UrlValidator urlValidator = new UrlValidator();
urlValidator.isValid("http://my favorite site!");

There are several properties that you can set to control how this class behaves, by default http, https, and ftp are accepted.

Tendayi Mawushe
  • 25,562
  • 6
  • 51
  • 57
  • 8
    it does not appear to work with newer domains such as .london etc – V H Jun 09 '15 at 14:36
  • how about intranet urls? – Puneet Mar 23 '17 at 07:39
  • It doesn't validate urls with underscores. – Udit Kumawat Mar 02 '18 at 09:57
  • Does not work with new TLDs and local domain names, e.g. `local`, etc. –  Mar 11 '18 at 10:42
  • I could not get UrlValidator to work with our wierd intranet top level domain. The common ones like .com, .org, and such works. I am not interested in creating a RegExp for this matter so the `new URL(name).toURI()` become the solution. – Avec Apr 30 '19 at 15:14
  • This return false on characters like "é". I know, for english people, you might not care. But there are a lot of characters like that out there and if you want to build something international, you should keep that in mind. – Netsab612 Apr 14 '21 at 14:59
69

Here is way I tried and found useful,

URL u = new URL(name); // this would check for the protocol
u.toURI(); // does the extra checking required for validation of URI 
Kirk Woll
  • 76,112
  • 22
  • 180
  • 195
Prasanna Pilla
  • 715
  • 5
  • 2
  • 2
    Good one. Using just new URL(name) accepts almost everything. The url.toURI(); is exactly what the developer is looking for - without using other libraries/frameworks! – justastefan Aug 28 '12 at 09:43
  • 2
    This will also not work for malformed URLs such as http:/google.com. I used UrlValidator from Apache Commons. – starf May 27 '14 at 16:02
  • 2
    This one is really dangerous. I see that there are lots of other articles out there with this example. `URL u = new URL(http://google).toURI();` will not throw an exception. – Sonu Oommen Jul 15 '19 at 11:24
  • 4
    @SonuOommen maybe because `new URL(http://google)` is valid^^ we have lot of internal domain in my company like this – user43968 Jul 23 '20 at 09:25
8

I'd love to post this as a comment to Tendayi Mawushe's answer, but I'm afraid there is not enough space ;)

This is the relevant part from the Apache Commons UrlValidator source:

/**
 * This expression derived/taken from the BNF for URI (RFC2396).
 */
private static final String URL_PATTERN =
        "/^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\\?([^#]*))?(#(.*))?/";
//         12            3  4          5       6   7        8 9

/**
 * Schema/Protocol (ie. http:, ftp:, file:, etc).
 */
private static final int PARSE_URL_SCHEME = 2;

/**
 * Includes hostname/ip and port number.
 */
private static final int PARSE_URL_AUTHORITY = 4;

private static final int PARSE_URL_PATH = 5;

private static final int PARSE_URL_QUERY = 7;

private static final int PARSE_URL_FRAGMENT = 9;

You can easily build your own validator from there.

Community
  • 1
  • 1
user123444555621
  • 148,182
  • 27
  • 114
  • 126
7

The most "foolproof" way is to check for the availability of URL:

public boolean isURL(String url) {
  try {
     (new java.net.URL(url)).openStream().close();
     return true;
  } catch (Exception ex) { }
  return false;
}
  • 2
    Actually querying a URL may result in a change, action, or tracking. OP wants to check validity without making the query. E.g., maybe this is to store now and execute later, with reasonable assurance it is valid. – Eric G Apr 29 '21 at 01:20
5

My favorite approach, without external libraries:

try {
    URI uri = new URI(name);

    // perform checks for scheme, authority, host, etc., based on your requirements

    if ("mailto".equals(uri.getScheme()) {/*Code*/}
    if (uri.getHost() == null) {/*Code*/}

} catch (URISyntaxException e) {
}
Steve
  • 415
  • 2
  • 8
  • 24
Andrei Volgin
  • 40,755
  • 6
  • 49
  • 58
4

I didn't like any of the implementations (because they use a Regex which is an expensive operation, or a library which is an overkill if you only need one method), so I ended up using the java.net.URI class with some extra checks, and limiting the protocols to: http, https, file, ftp, mailto, news, urn.

And yes, catching exceptions can be an expensive operation, but probably not as bad as Regular Expressions:

final static Set<String> protocols, protocolsWithHost;

static {
  protocolsWithHost = new HashSet<String>( 
      Arrays.asList( new String[]{ "file", "ftp", "http", "https" } ) 
  );
  protocols = new HashSet<String>( 
      Arrays.asList( new String[]{ "mailto", "news", "urn" } ) 
  );
  protocols.addAll(protocolsWithHost);
}

public static boolean isURI(String str) {
  int colon = str.indexOf(':');
  if (colon < 3)                      return false;

  String proto = str.substring(0, colon).toLowerCase();
  if (!protocols.contains(proto))     return false;

  try {
    URI uri = new URI(str);
    if (protocolsWithHost.contains(proto)) {
      if (uri.getHost() == null)      return false;

      String path = uri.getPath();
      if (path != null) {
        for (int i=path.length()-1; i >= 0; i--) {
          if ("?<>:*|\"".indexOf( path.charAt(i) ) > -1)
            return false;
        }
      }
    }

    return true;
  } catch ( Exception ex ) {}

  return false;
}
isapir
  • 21,295
  • 13
  • 115
  • 116
3

Judging by the source code for URI, the

public URL(URL context, String spec, URLStreamHandler handler)

constructor does more validation than the other constructors. You might try that one, but YMMV.

Eric G
  • 907
  • 1
  • 9
  • 30
uckelman
  • 25,298
  • 8
  • 64
  • 82
2

validator package:

There seems to be a nice package by Yonatan Matalon called UrlUtil. Quoting its API:

isValidWebPageAddress(java.lang.String address, boolean validateSyntax, 
                      boolean validateExistance) 
Checks if the given address is a valid web page address.

Sun's approach - check the network address

Sun's Java site offers connect attempt as a solution for validating URLs.

Other regex code snippets:

There are regex validation attempts at Oracle's site and weberdev.com.

Adam Matan
  • 128,757
  • 147
  • 397
  • 562
0

There is alsow a Function in org.apache.xerces.util.URI

isWellFormedAddress(java.lang.String address)

Determine whether a string is syntactically capable of representing a valid IPv4 address, IPv6 reference or the domain name of a network host.

Julian
  • 334
  • 4
  • 18