4

I'm using java.net.URL.getPort() to extract the port number from a URL. Most of the time this works great. However, when the URL contains a right bracket character "]" it fails:

new URL("http://abc.com:123/abc.mp3").getPort();
 returns: (int) 123

But if the URL contains "]" I get:

new URL("http://abc.com:123/abc].mp3").getPort();
 returns: (int) -1

What am I doing wrong?

EDIT #1: As a test, I pasted this same code into a non-Android Java app and the port number was correctly returned, so this appears to be an anomaly with the Android SDK.

Mike Lowery
  • 2,630
  • 4
  • 34
  • 44

5 Answers5

5

If your URL contains some symbols that are not valid in URLs, you have to use an URL-encoded String. They way to do it in Java seem to be by using URI.

new URI( "http", null, "abc.com", 123, "abc].mp3", null, null).toURL().getPort();

If you already has an URL string:

URL url = new URL("http://abc.com:123/abc].mp3");

Then this works for me:

new URI(
    url.getProtocol(),
    null,
    url.getHost(),
    url.getPort(),
    url.getPath(),
    null,
    null);

But then again I'm using url.getPort() that you said didn't work. But when I'm testing on Java 6 now. new URL("http://abc.com:123/abc].mp3").getPort(); actually works for me, maybe it's just on Android it doesn't work? In case it doesn't work I think it's best to use a third party library for this. Apache Http Client that is included in Android seem to have some extra functionality for URLs: see org.apache.http.client.utils

See also HTTP URL Address Encoding in Java

Community
  • 1
  • 1
Jonas
  • 121,568
  • 97
  • 310
  • 388
  • 2
    In other words, square brackets are not valid in URLs: http://www.ietf.org/rfc/rfc3986.txt – CommonsWare Jan 29 '11 at 21:19
  • 1
    This causes a java.net.MalformedURLException to be thrown because everything gets encoded, not just the invalid characters. – Mike Lowery Jan 29 '11 at 21:44
  • 1
    @DiskCrasher: You were right about that. I updated and used the `URI` class instead. Now is it working but the API is clumsy :( – Jonas Jan 30 '11 at 00:24
  • But this assumes you've already extracted portions of the URL string (context, host, etc.) In my case, I'm given a URL string that I need to properly encode. Attempting to use URL for this has failed. – Mike Lowery Jan 30 '11 at 01:19
  • @DiskCrasher: Yes, it was a clumsy API, I have now updated my answer with more helpful information again. I hope this help you. – Jonas Jan 30 '11 at 10:03
  • @Jonas: See my edit. It looks to be a problem in Android only. My workaround is to look for a ":" in the returned host name. If it exists, extract the port number from it. Probably hammers ipv6 addresses but I'm not worried about that just now! – Mike Lowery Jan 31 '11 at 04:52
2
"http://abc.com:123/abc].mp3"

] is not allowed in the path part of a URI, so this is not a URL. However, you can modify the regular expression in the spec to get this information:

    //TODO: import java.util.regex.*;
    String expr = "^(([^:/?#]+):)?(//([^:/?#]*):([\\d]*))?";
    Matcher matcher = Pattern.compile(expr)
                             .matcher("http://abc.com:123/abc].mp3");
    if (matcher.find()) {
      String port = matcher.group(5);
      System.out.println(port);
    }

Despite the name, URLEncoder doesn't encode URLs. It should only be used to encode parameters in the query part when the server is expecting application/x-www-form-urlencoded encoded data. The URI and URL classes behave as documented - they aren't going to help you here.

Community
  • 1
  • 1
McDowell
  • 107,573
  • 31
  • 204
  • 267
  • I understand URLEncoder is only helpful for encoding the query part of the URL. The trick is extracting only that part from the URL string. Even then I'm not sure that solves my problem as I've been going around in circles with this all day. This seems like such a common task I'm baffled why it's so difficult to do, at least in Java. – Mike Lowery Jan 30 '11 at 01:30
  • @DiskCrasher - I do not agree that Java should provide a specific API for parsing something that's like a URL but is slightly different in an undefined way - it isn't clear how many rules set down by the URI spec such an API would ignore. I don't care much for @Jonas' solution because this relies on the implementation being tolerant of junk data - it works by accident, not design and may behave differently on another JVM. If there's a bug in my code for extracting the port, I'd be interested to know what you're seeing. – McDowell Jan 30 '11 at 12:23
  • 1
    @McDowell: I hear you, but illegal characters are everywhere on the Internet. In my case this is happening in the file names of directory listings. Browsers can handle them, why not Java? I didn't use your code because I implemented a simpler workaround. Note that this problem does not exist in Java 6, just Android. – Mike Lowery Jan 31 '11 at 04:55
  • @DiskCrasher - Relying on undocumented implementation behaviour means that changes in future Dalvik implementations may break your code - nothing in the doc says how it has to behave with junk data. The Java API will likely be supporting the currently documented behaviour in 15 years time; if you document it, you can't change it without breaking consumers of the API. Browsers don't have this restriction. This problem is best solved by bespoke code or a specialized API. Managing such dependencies gives you control over how to respond to shifting trends in invalid data. – McDowell Feb 02 '11 at 12:12
  • @DiskCrasher - I understand the need to be pragmatic. If your software has a short shelf-life, or is targetted at one runtime, my arguments may be moot. The decisions about what should and should not be in standard APIs is an interesting one and I expect that the answers depend on the platform in question. – McDowell Feb 02 '11 at 12:16
  • @McDowell: Undocumented or not, I expected Java to function the same regardless of platform. Apparently that's not the case. The only safe solution I see is to have the user enter the server and port using text boxes instead of trying to extract them from a user-supplied URL string. Or implement a fancy regular expression (which I'm not very good at). – Mike Lowery Feb 05 '11 at 02:33
1

Here is a simpler way to extract port from URLs that may be different from HTTP, e.g. JNDI connection URLs:

int port = 80; // assumption of default port in the URL
Pattern p = Pattern.compile(":\\d+"); // look for the first occurrence of colon followed by a number
Matcher matcher = p.matcher(urlSrtr);
if (matcher.find()) {
    String portStrWithColon = matcher.group();
    if (portStrWithColon.length() > 1) {
        String portStr = portStrWithColon.substring(1);
        try {
            port = Integer.parseInt(portStr);
        } catch (NumberFormatException e) {
            // handle
        }
    }
}
return port;
Rami Jaamour
  • 171
  • 2
  • 2
1

According to RFC1738 the ] character is unsafe:

Other characters are unsafe because gateways and other transport agents are known to sometimes modify such characters. These characters are "{", "}", "|", "\", "^", "~", "[", "]", and "`".

Thus, only alphanumerics, the special characters "$-_.+!*'(),", and reserved characters used for their reserved purposes may be used unencoded within a URL.

You should encode either the individual character that you want to add, or run the whole string through a URL encoder. Try this:

new URL("http://abc.com:123/abc%5D.mp3").getPort();
dave.c
  • 10,910
  • 5
  • 39
  • 62
  • Running it through URLEncoder causes the problem I mentioned above where more than just the unsafe characters are encoded. For example, the ":" and "/" are also encoded in "http://". I don't want that. Your second example is what I want, but that's not what I'm getting from URLEncoder. Does Java not have a simple function that can do this? – Mike Lowery Jan 29 '11 at 23:04
  • @DiskCrasher I'd use the approach suggested by @Jonas, though I agree it feels a little clunky – dave.c Jan 30 '11 at 01:11
0

String encodedURL = new URI("http", null, "//abc.com:8080/abc[d].jpg", null, null).toASCIIString();

user207421
  • 305,947
  • 44
  • 307
  • 483