23

Looking on the W3 Schools URL encoding webpage, it says that @ should be encoded as %40, and that space should be encoded as %20.

I've tried both URLEncoder and URI, but neither does the above properly:

import java.net.URI;
import java.net.URLEncoder;

public class Test {
    public static void main(String[] args) throws Exception {

        // Prints me%40home.com (CORRECT)
        System.out.println(URLEncoder.encode("me@home.com", "UTF-8"));

        // Prints Email+Address (WRONG: Should be Email%20Address)
        System.out.println(URLEncoder.encode("Email Address", "UTF-8"));

        // http://www.home.com/test?Email%20Address=me@home.com
        // (WRONG: it has not encoded the @ in the email address)
        URI uri = new URI("http", "www.home.com", "/test", "Email Address=me@home.com", null);
        System.out.println(uri.toString());
    }
}

For some reason, URLEncoder does the email address correctly but not spaces, and URI does spaces currency but not email addresses.

How should I encode these 2 parameters to be consistent with what w3schools says is correct (or is w3schools wrong?)

John Farrelly
  • 7,289
  • 9
  • 42
  • 52
  • 4
    If you are looking at w3schools.com, then you are doing it wrong. Refer to [this](http://w3fools.com/) – Srinivas Jan 14 '13 at 15:59
  • @Srinivas the webservice I am using explicitly ignores requests unless parameters are encoded as explained on the w3schools webpage :( – John Farrelly Jan 14 '13 at 16:02
  • 1
    `URLEncoder` does not encode as per the URL specification but as per the the `application/x-www-form-urlencoded` MIME format (which is what most application servers expect for parameter keys/values.) The `URI` type encodes as per its documentation - that is, it isn't a complete URL builder. Note that different parts of the URI have different rules. See [this post](http://illegalargumentexception.blogspot.co.uk/2009/12/java-safe-character-handling-and-url.html) for more analysis. – McDowell Jan 14 '13 at 16:05
  • 1
    @McDowell Yes, I think I should have asked how do I get java to do what JavaScript's encodeURIComponent() does. I'll check out your lib. – John Farrelly Jan 14 '13 at 16:30

2 Answers2

43

Although I think the answer from @fge is the right one, as I was using a 3rd party webservice that relied on the encoding outlined in the W3Schools article, I followed the answer from Java equivalent to JavaScript's encodeURIComponent that produces identical output?

public static String encodeURIComponent(String s) {
    String result;

    try {
        result = URLEncoder.encode(s, "UTF-8")
                .replaceAll("\\+", "%20")
                .replaceAll("\\%21", "!")
                .replaceAll("\\%27", "'")
                .replaceAll("\\%28", "(")
                .replaceAll("\\%29", ")")
                .replaceAll("\\%7E", "~");
    } catch (UnsupportedEncodingException e) {
        result = s;
    }

    return result;
}
John Farrelly
  • 7,289
  • 9
  • 42
  • 52
  • 4
    You forgot the & symbol which is important for decoding the url (either for GET or POST method), because its the symbol that separates the keys in the request – Giorgos Fandomas Aug 10 '15 at 09:49
  • I am compelled to point out that w3schools is not the W3C. They are quite, quite different. – Mike B Nov 06 '18 at 09:24
16

URI syntax is defined by RFC 3986 (permissible content for a query string are defined in section 3.4). Java's URI complies to this RFC, with a few caveats mentioned in its Javadoc.

You will notice that the pchar grammar rule is defined by:

pchar = unreserved / pct-encoded / sub-delims / ":" / "@"

Which means a @ is legal in a query string.

Trust URI. It will do the correct, "legal" stuff.

Finally, if you have a look at the Javadoc of URLEncoder, you see that it states:

This class contains static methods for converting a String to the application/x-www-form-urlencoded MIME format.

Which is not the same thing as a query string as defined by the URI specification.

Community
  • 1
  • 1
fge
  • 119,121
  • 33
  • 254
  • 329
  • I think the question I should have asked is how do I get java to encode a URL the same way as JavaScript encodeURIComponent, since this is what the receiving webservice expects: http://stackoverflow.com/questions/607176/java-equivalent-to-javascripts-encodeuricomponent-that-produces-identical-outpu – John Farrelly Jan 14 '13 at 16:28
  • Since then, I have developed a library which does URI templates (RFC 6570), which is even more powerful ;) – fge Jul 05 '13 at 05:51
  • 5
    this is weird... the Javadocs for URI states it follows RFC 2396, even in [Java 8](http://docs.oracle.com/javase/8/docs/api/java/net/URI.html), where [RFC 2396](https://tools.ietf.org/html/rfc2396) is from 1998, and it has been **obsoleted** by [RFC 3986](https://tools.ietf.org/html/rfc3986) since 2005 – arcuri82 Mar 28 '17 at 19:31