13

Solution

Okay I found 1 solution on Stackoverflow after a little more searching but I hope to do it with no extra libraries. How to check for a valid URL in Java?

My problem:

First of hopefully this is not a duplicate, but I could not find the right answer(right away). I would like to validate that an URI(http) is valid in Java. I came up with the following tests but I can't get them to pass. First I used getPort(), but then http://www.google.nl will return -1 on getPort(). This are the test I want to have passed

Test:

@Test
public void testURI_Isvalid() throws Exception {
    assertFalse(HttpUtils.validateHTTP_URI("ttp://localhost:8080"));
    assertFalse(HttpUtils.validateHTTP_URI("ftp://localhost:8080"));
    assertFalse(HttpUtils.validateHTTP_URI("http://localhost:8a80"));
    assertTrue(HttpUtils.validateHTTP_URI("http://localhost:8080"));
    final String justWrong = 
        "/schedule/get?uri=http://localhost:8080&time=1000000";
    assertFalse(HttpUtils.validateHTTP_URI(justWrong));
    assertTrue(HttpUtils.validateHTTP_URI("http://www.google.nl"));
}

This is what I came up with after I removed the getPort() part but this does not pass all my unit tests.

Production code:

  public static boolean validateHTTP_URI(String uri) {
        final URI u;
        try {
            u = URI.create(uri);
        } catch (Exception e1) {
            return false;
        }
        return "http".equals(u.getScheme());
  }

This is the first test that is failing because I am no longer validating the getPort() part. Hopefully somebody can help me out. I think I am not using the right class to validate URLs?

P.S:

I don't want to connect to the server to validate the URI is correct. At least not yet in this step. I only want to validate scheme.

2240
  • 1,547
  • 2
  • 12
  • 30
Alfred
  • 60,935
  • 33
  • 147
  • 186
  • 'I don't want to connect to the server to validate the URI is correct' Why not? It's the only conclusive way to decide. – user207421 Apr 08 '10 at 08:42
  • 2
    @EJP disagree. IETF published [RFC 3986](http://www.ietf.org/rfc/rfc3986.txt) clearly establishes a rigorous syntax. For a URI to be valid, you only have to verify it against this syntax. Connecting to it verifies that a server is registered for and listening to that URI which **1.** is an entirely separate condition from the validity of the URI itself, **2.** makes your tests depend on an external resource, and **3.** is arguably implementation specific. HTTP clients can behave in any number of non-standard ways. – Patrick M Jul 18 '14 at 19:02

2 Answers2

10

Code that will pass

public static boolean validateHTTP_URI(String uri) {
    final URL url;
    try {
        url = new URL(uri);
    } catch (Exception e1) {
        return false;
    }
    return "http".equals(url.getProtocol());
}

My next question is:

I heard (Joshua Bloch I believe) somewhere that URL does not work properly if you don't have internet (anymore). But I don't think that's true (anymore)? Could someone please elaborate.

2240
  • 1,547
  • 2
  • 12
  • 30
Alfred
  • 60,935
  • 33
  • 147
  • 186
  • 4
    URL involves DNS. URI doesn't. – user207421 Apr 08 '10 at 08:42
  • So if I unplug internet then URL will fail? Will test it later. – Alfred Apr 08 '10 at 10:39
  • 1
    URL uses DNS to resolve equality between different URL instances. So if you're using the URL in a map it could potentially fail if you don't have internet access. Otherwise I don't think the URL class resolves addresses until you use openConnection. – jontejj Jun 26 '13 at 16:11
-2

You could try to use this regular expression:

(?:(?<protocol>http(?:s?)|ftp)(?:\:\/\/)) (?:(?<usrpwd>\w+\:\w+)(?:\@))? (?<domain>[^/\r\n\:]+)? (?<port>\:\d+)? (?<path>(?:\/.*)*\/)? (?<filename>.*?\.(?<ext>\w{2,4}))? (?<qrystr>\??(?:\w+\=[^\#]+)(?:\&?\w+\=\w+)*)* (?<bkmrk>\#.*)?

This will tell you if an URL is valid and it will give you the protocol value. I don't know Java, so I don't know what class you need to use to validate Regular Expressions.

Tom
  • 6,991
  • 13
  • 60
  • 78