129

How does one encode query parameters to go on a url in Java? I know, this seems like an obvious and already asked question.

There are two subtleties I'm not sure of:

  1. Should spaces be encoded on the url as "+" or as "%20"? In chrome if I type in "http://google.com/foo=?bar me" chrome changes it to be encoded with %20
  2. Is it necessary/correct to encode colons ":" as %3B? Chrome doesn't.

Notes:

  • java.net.URLEncoder.encode doesn't seem to work, it seems to be for encoding data to be form submitted. For example, it encodes space as + instead of %20, and encodes colon which isn't necessary.
  • java.net.URI doesn't encode query parameters
Johan Sjöberg
  • 47,929
  • 21
  • 130
  • 148
waterlooalex
  • 13,642
  • 16
  • 78
  • 99
  • This question looks useful: http://stackoverflow.com/questions/444112/how-do-i-encode-uri-parameter-values – waterlooalex Mar 16 '11 at 19:14
  • 2
    the structure of the query part is server-dependent, though most expect `application/x-www-form-urlencoded` key/value pairs. See here for more: http://illegalargumentexception.blogspot.com/2009/12/java-safe-character-handling-and-url.html – McDowell Mar 16 '11 at 20:18

9 Answers9

147

java.net.URLEncoder.encode(String s, String encoding) can help too. It follows the HTML form encoding application/x-www-form-urlencoded.

URLEncoder.encode(query, "UTF-8");

On the other hand, Percent-encoding (also known as URL encoding) encodes space with %20. Colon is a reserved character, so : will still remain a colon, after encoding.

Buhake Sindi
  • 87,898
  • 29
  • 167
  • 228
  • 5
    I mentioned that I didn't think that does url encoding, instead it encodes data to be submitted via a form. comments? – waterlooalex Mar 16 '11 at 18:50
  • That's because `URLEncoder` is conformed to `application/x-www-form-urlencoded` MIME format (which is a valid HTML form encoding). I'm assuming that's not what you're looking for. – Buhake Sindi Mar 16 '11 at 18:54
  • Right, so doesn't that disqualify your answer? Or, are you saying its output is still valid, just stricter than necessary? – waterlooalex Mar 16 '11 at 18:55
  • @Alex Black, I just updated my comment. I'm assuming you're looking for encoding to conform to URI as specified in RFC2396. – Buhake Sindi Mar 16 '11 at 18:59
  • Btw, if you're using HttpClient 4, you don't need to as HttpClient does it for you. – Buhake Sindi Mar 16 '11 at 19:02
  • Yes, RFC2396 looks like the encoding I want. It looks to me like URLEncoder.encode is for http://www.w3.org/TR/html401/interact/forms.html#form-content-type – waterlooalex Mar 16 '11 at 19:02
  • @Elite: I can't ever seem to figure out what you mean :) Yes, I am using HttpClient 4, so far its not doing it for me. Are you saying there is a method in it that does? – waterlooalex Mar 16 '11 at 19:03
  • No, If you're doing an `HttpGet` then encoding is necessary, but generally, passing parameters with `HttpParams`, HttpClient 4 knows how to encode them. – Buhake Sindi Mar 16 '11 at 19:37
  • 7
    I ended up using URLEncoder.encode and replacing "+" with "%20" – waterlooalex Mar 17 '11 at 12:38
  • Because one of the (3rd party) sites I am sending HTTP requests to does not decode "+" to " ", but it does decode "%20" to " ". – waterlooalex Mar 17 '11 at 17:13
  • 3
    It encodes slashes to "%2F", shouldn't it leave the URL slashes as they are? – golimar Oct 31 '13 at 11:43
  • 8
    @golimar No, it shouldn't. You are supposed to give it parameter value only and not the whole URL. Consider example `http://example.com/?url=http://example.com/?q=c&sort=name`. Should it encode `&sort=name` or not? There is no way to distinguish value from the URL. That is the exact reason why you need value encoding in the first place. – Pijusn Aug 23 '14 at 10:35
  • 3
    But actually, slash is a legal character in querystring parameter values. – Stijn de Witt May 08 '17 at 14:40
24

Unfortunately, URLEncoder.encode() does not produce valid percent-encoding (as specified in RFC 3986).

URLEncoder.encode() encodes everything just fine, except space is encoded to "+". All the Java URI encoders that I could find only expose public methods to encode the query, fragment, path parts etc. - but don't expose the "raw" encoding. This is unfortunate as fragment and query are allowed to encode space to +, so we don't want to use them. Path is encoded properly but is "normalized" first so we can't use it for 'generic' encoding either.

Best solution I could come up with:

return URLEncoder.encode(raw, "UTF-8").replaceAll("\\+", "%20");

If replaceAll() is too slow for you, I guess the alternative is to roll your own encoder...

EDIT: I had this code in here first which doesn't encode "?", "&", "=" properly:

//don't use - doesn't properly encode "?", "&", "="
new URI(null, null, null, raw, null).toString().substring(1);
Community
  • 1
  • 1
Kosta
  • 812
  • 1
  • 7
  • 13
  • 1
    `+` is a perfectly valid encoding of a space. – Lawrence Dol Dec 15 '15 at 23:00
  • 1
    @LawrenceDol it's true but sometimes `+` may be interpreted incorrectly - take a look at C# https://blogs.msdn.microsoft.com/yangxind/2006/11/08/dont-use-net-system-uri-unescapedatastring-in-url-decoding/ – Ilya Serbis Apr 14 '16 at 08:42
  • This. I compared various alternatives against Javascript's `encodeURIComponent` method output, and this was the only exact match for the ones I tried (queries with spaces, Turkish and German special characters). – Utku Özdemir Nov 27 '17 at 10:43
  • Ahmet+Mehmet Demir => `Ahmet%2BMehmet+Demir` , According to my understanding the only problem here is MIME type `application/x-www-form-urlencoded`. In such cases space is encoded to `+` char, if the intention was searching two entries in a web form, like google search by a GET request. URI RFC allows `+` char as a valid char. So, it doesn't need to be escaped normally. – Davut Gürbüz Jan 08 '22 at 21:50
16

EDIT: URIUtil is no longer available in more recent versions, better answer at Java - encode URL or by Mr. Sindi in this thread.


URIUtil of Apache httpclient is really useful, although there are some alternatives

URIUtil.encodeQuery(url);

For example, it encodes space as "+" instead of "%20"

Both are perfectly valid in the right context. Although if you really preferred you could issue a string replace.

Community
  • 1
  • 1
Johan Sjöberg
  • 47,929
  • 21
  • 130
  • 148
  • I would have to agree. Use HttpClient, you will be much happier. – DaShaun Mar 16 '11 at 18:44
  • That look promising, got a link by chance? I'm googling but finding many. – waterlooalex Mar 16 '11 at 18:44
  • 1
    This method doesn't seem to be present in HttpClient 4.1? http://hc.apache.org/httpcomponents-client-ga/httpclient/apidocs/org/apache/http/client/utils/URIUtils.html – waterlooalex Mar 16 '11 at 18:49
  • @Alex, hmm that's annoying, I've always used that routine with good results. One idea is to grab the source code from the 3 release since they now obviously didn't want to maintain it anymore. – Johan Sjöberg Mar 16 '11 at 18:50
  • Along time ago I copied the class from the old HTTP commons (and altered it so it was a single class) and put it on gist: https://gist.github.com/agentgt/3011049 – Adam Gent Apr 25 '13 at 22:48
  • You bet this is annoying. Currently, there is a `URLEncodedUtils.encodeFormFields` which is a private static method. Wouldn't it be reasonable to this method be declared as public? – Cacovsky Feb 19 '14 at 17:12
  • 1
    `URIUtil.encodeWithinQuery` is what you would use an encode an individual query parameter, which is what the original question seemed to be asking. – Jesse Glick Mar 21 '14 at 20:33
10

It is not necessary to encode a colon as %3B in the query, although doing so is not illegal.

URI         = scheme ":" hier-part [ "?" query ] [ "#" fragment ]
query       = *( pchar / "/" / "?" )
pchar         = unreserved / pct-encoded / sub-delims / ":" / "@"
unreserved    = ALPHA / DIGIT / "-" / "." / "_" / "~"
pct-encoded   = "%" HEXDIG HEXDIG
sub-delims    = "!" / "$" / "&" / "'" / "(" / ")" / "*" / "+" / "," / ";" / "="

It also seems that only percent-encoded spaces are valid, as I doubt that space is an ALPHA or a DIGIT

look to the URI specification for more details.

Community
  • 1
  • 1
Edwin Buck
  • 69,361
  • 7
  • 100
  • 138
  • 1
    But doing so can change the meaning of the URI, since the interpretation of the query string is up to the server. If you are producing a `application/x-www-form-urlencoded` query string, either is fine. If you are fixing up a URL that the user typed/pasted in, `:` should be left alone. – tc. Mar 26 '13 at 18:44
  • @tc. You are right, if colon is being used as a general delimiter (page 12 of the RFC); however, if it is not being used as a general delimiter, then both encodings should resolve identically. – Edwin Buck Mar 27 '13 at 21:24
  • You also have to be careful as URLs are not really a subset of URI: http://adamgent.com/post/25161273526/urls-are-not-a-subset-of-uris – Adam Gent Apr 25 '13 at 22:51
  • A colon is %3A not %3B (thats a semicolon), for anybody who is manually encoding – Marcelino Lucero III Jan 27 '22 at 21:51
4

The built in Java URLEncoder is doing what it's supposed to, and you should use it.

A "+" or "%20" are both valid replacements for a space character in a URL. Either one will work.

A ":" should be encoded, as it's a separator character. i.e. http://foo or ftp://bar. The fact that a particular browser can handle it when it's not encoded doesn't make it correct. You should encode them.

As a matter of good practice, be sure to use the method that takes a character encoding parameter. UTF-8 is generally used there, but you should supply it explicitly.

URLEncoder.encode(yourUrl, "UTF-8");
rfeak
  • 8,124
  • 29
  • 28
  • 6
    `+` is only a representation of space in `application/x-www-form-urlencoded`; it is not guaranteed to work even when restricted to HTTP. Similarly, `:` is valid *in a query string* and *should not* be converted to `%3B`; a server can choose to interpret them differently. – tc. Mar 26 '13 at 18:38
  • 1
    this method also encode whole url slashes and other characters which are part e.g `http://` to `http%3A%2F%2F` which is not correct – To Kra May 22 '15 at 10:47
  • 3
    @ToKra you are not supposed to encode the `http://` part. The method is for query parameters and encoded form data. If, however, you wanted to pass the URL of another website as a query parameter, THEN you would want to encode it to avoid confusing the URL parser. – beldaz Jul 15 '16 at 10:00
  • @tc My reading of https://www.w3.org/TR/html4/interact/forms.html#h-17.13.3.3 is that all GET form data is encoded as `application/x-www-form-urlencoded` content type. Doesn't that mean is must work for HTTP? – beldaz Jul 15 '16 at 10:07
1

I just want to add anther way to resolve this problem.

If your project depends on spring web, you can use their utils.

import org.springframework.web.util.UriUtils

import java.nio.charset.StandardCharsets

UriUtils.encode('vip:104534049:5', StandardCharsets.UTF_8)

Output:

vip%3A104534049%3A5

aristotll
  • 8,694
  • 6
  • 33
  • 53
0
String param="2019-07-18 19:29:37";
param="%27"+param.trim().replace(" ", "%20")+"%27";

I observed in case of Datetime (Timestamp) URLEncoder.encode(param,"UTF-8") does not work.

0

The white space character " " is converted into a + sign when using URLEncoder.encode. This is opposite to other programming languages like JavaScript which encodes the space character into %20. But it is completely valid as the spaces in query string parameters are represented by +, and not %20. The %20 is generally used to represent spaces in URI itself (the URL part before ?).

Janisito
  • 1
  • 1
  • 1
-3

if you have only space problem in url. I have used below code and it work fine

String url;
URL myUrl = new URL(url.replace(" ","%20"));

example : url is

www.xyz.com?para=hello sir

then output of muUrl is

www.xyz.com?para=hello%20sir