0

I'm creating an URL variable:

URL inputURL = null;
try {
    inputURL = new URL(inputUrlString);
} catch (MalformedURLException e) {
    Log.e(TAG, "Bad Parsing.");
    e.printStackTrace();

    AlertDialog ad = new AlertDialog.Builder(this)
            .setTitle("Error")
            .setMessage("URL is not HTTP-like url.")
            .setCancelable(true).create();
    ad.show();
}

if inputUrlString is "http:","http:/" or "http:/rubbish" it parses it like it's ok, goes further and crushes everything. Is it really a valid URL? Is a good practice of parsing it is through Pattern class?

While True
  • 423
  • 2
  • 15
  • please share "catch" clause of your example as well. – izce Nov 20 '15 at 13:45
  • What do you mean exactly? I do not understand your question sorry. – Yassin Hajaj Nov 20 '15 at 13:45
  • @YassinHajaj, to my mind it's not a proper URL to successfully parse it and openUrlConnection() on it to fail later. – While True Nov 20 '15 at 13:47
  • @fantomasdnb I still don't understand. What is not a proper url? What is the problem exactly? Please take the time to make your question clear to understand. What is the problem? – Yassin Hajaj Nov 20 '15 at 13:48
  • 1
    @YassinHajaj, I don't want my app to connect to an address like "http:sdfasdfasdfas" or something. It's missing "//" part, domain part and I thought it should have thrown a MalformedURLException on that kind of input. Why it thinks it's valid address and what's the right way to check if it's proper "http(s)://(www).website.domain address? – While True Nov 20 '15 at 13:52
  • @fantomasdnb Here, I answered to your question. I think you'd have to check the URL with another method. – Yassin Hajaj Nov 20 '15 at 14:04
  • @fantomasdnb, so you were getting MalformedURLException already and still asking whether those strings are valid or not!?!? – izce Nov 20 '15 at 14:15
  • @IzCe, no, I wasn't getting it. I wrote what's happening pretty clear. See the words "it parses it like it's ok". – While True Nov 20 '15 at 14:26

5 Answers5

1

Throws:
MalformedURLException - if no protocol is specified, or an unknown protocol is found, or spec is null.

As you can see in the URL javadoc the constructor itself is quite lenient.

You could use apache common's UrlValidator, or just watch out for errors when using the URL.

Aaron
  • 24,009
  • 2
  • 33
  • 57
  • I think the real problem is that some other code in the application @fantomasdnb is writing barfs when handed _perfectly valid URIs_. (Only one of the supposedly malformed examples is actually illegal. See [my answer](http://stackoverflow.com/questions/33828155/javas-url-not-parsing-string-properly/33829462#33829462) below.) – Kevin J. Chase Nov 20 '15 at 15:21
1

Separatly parsing a URL only seems to make sense if you (e.g.) want to see if it is an email-adress. You can't tell Java to 'look' if you/the user entered rubbish. You could just catch the exception, that is thrown, if the browser/whatever tries to access it.

See the oracle documentation on how to use URL in Java.

Have a look at this post, maybe this is what you are looking for.

Community
  • 1
  • 1
Dominik Reinert
  • 1,075
  • 3
  • 12
  • 26
1

You have two problems, only one of which you've already encountered.

1. Don't use URL!

The URL class does some weird and unexpected things that you basically never want. For example, the URL.equals method states (emphasis mine):

Two hosts are considered equivalent if both host names can be resolved into the same IP addresses [...]

Since hosts comparison requires name resolution, this operation is a blocking operation.

Note: The defined behavior for equals is known to be inconsistent with virtual hosting in HTTP.

Use URI instead. It's docs describe a few other shortcomings of the URL class, including:

  • Not all URIs can be represented as URLs:

    • URLs must be absolute (start with a "scheme:").

    • You can't create a URL for a scheme that doesn't already have a (stream) handler.

  • Comparison is not defined.

  • URL.equals and URL.hashCode both block while they consult the Internet.

  • Object equality (and hash codes) can vary based on your DNS setup... Two "equal" URL objects on one machine might be un-equal on another.

Yikes.

2. Your expectations are wrong.

There is nothing really wrong with a URI like "http:sdfasdfasdfas". It will even work in many browsers... if you happen to have a local host named "sdfasdfasdfas", and it serves Web pages.

The URI class docs, under "URI syntax and components", define URIs as made up of the following parts:

[scheme:]scheme-specific-part[#fragment]

Your example "http:sdfasdfasdfas" has a scheme, making it an "absolute URI". It also has a scheme-specific part, but no fragment. Regarding the scheme-specific part...

An opaque URI is an absolute URI whose scheme-specific part does not begin with a slash character ('/'). Opaque URIs are not subject to further parsing. Some examples of opaque URIs are:

  • mailto:java-net@java.sun.com
  • news:comp.lang.java
  • urn:isbn:096139210x

Your example is an opaque URI, and its scheme-specific part may be almost anything, including that weird "hostname".

Your other examples are also valid URIs, with one exception:

  • "http:" would be an absolute opaque URI, but it's missing the required scheme-specific part. ("" isn't good enough).

  • "http:/" is an absolute hierarchical URI with scheme "http:" and path "/".

  • "http:/rubbish" is the same, but with the path "/rubbish".

If you wanted the URI class (or the URL class, if you insist) to verify opaque URIs for you, it would have to "know" how valid scheme-specific parts are defined for all schemes... including ones that don't exist yet.

Conclusion

You can declare valid URIs like your example(s) to be invalid if you really want, but you'll probably have to code something of your own to throw a MalformedURLException, or preferably your own more specific exception.

I think you'd be better off accepting the definition of "URI" that the rest of the world uses, and spending your time fixing whatever code is choking on valid URIs.

Kevin J. Chase
  • 3,856
  • 4
  • 21
  • 43
0

Is a good practice of parsing it is through Pattern class?

I guess that depends where inputUrlString is coming from. If it's something a user is inputting, it's always a good idea to scrub it.

Kip
  • 560
  • 7
  • 16
0

As you may see, URL object's have a constructor that is called when using URL(String) and that is

URL(URL, String, URLStreamHandler)

Within this constructor, you have a test to check if the String entered contains a : and if what happens before the : is a known protocol. See below for the code


CODE

The following portion checks, as you may see, the existence of ':'. When finding it, it checks, by the method isValidProtocol if the text before is a valid known protocol. That is why http: is a valid String for the constructor.

540                 for (i = start ; !aRef && (i < limit) &&
541                      ((c = spec.charAt(i)) != '/') ; i++) {
542                 if (c == ':') {
543 
544                     String s = spec.substring(start, i).toLowerCase();
545                     if (isValidProtocol(s)) {
546                         newProtocol = s;
547                         start = i + 1;
548                     }
549                     break;
550                 }

isValidProtocol method

623     /*
624      * Returns true if specified string is a valid protocol name.
625      */
626     private boolean isValidProtocol(String protocol) {
627         int len = protocol.length();
628         if (len < 1)
629             return false;
630         char c = protocol.charAt(0);
631         if (!Character.isLetter(c))
632             return false;
633         for (int i = 1; i < len; i++) {
634             c = protocol.charAt(i);
635             if (!Character.isLetterOrDigit(c) && c != '.' && c != '+' &&
636                 c != '-') {
637                 return false;
638             }
639         }
640         return true;
641     }

Source

Yassin Hajaj
  • 21,337
  • 9
  • 51
  • 89