241

I am expecting

System.out.println(java.net.URLEncoder.encode("Hello World", "UTF-8"));

to output:

Hello%20World

(20 is ASCII Hex code for space)

However, what I get is:

Hello+World

Am I using the wrong method? What is the correct method I should be using?

Brant Bobby
  • 14,956
  • 14
  • 78
  • 115
Cheok Yan Cheng
  • 47,586
  • 132
  • 466
  • 875
  • 3
    the class name is indeed confusing, and many people have used it wrongly. however they don't notice it, because when URLDecoder is applied, the original value is restored, so + or %20 doesn't really matter for them. – irreputable Jan 19 '11 at 21:07

19 Answers19

255

This behaves as expected. The URLEncoder implements the HTML Specifications for how to encode URLs in HTML forms.

From the javadocs:

This class contains static methods for converting a String to the application/x-www-form-urlencoded MIME format.

and from the HTML Specification:

application/x-www-form-urlencoded

Forms submitted with this content type must be encoded as follows:

  1. Control names and values are escaped. Space characters are replaced by `+'

You will have to replace it, e.g.:

System.out.println(java.net.URLEncoder.encode("Hello World", "UTF-8").replace("+", "%20"));
Sanghyun Lee
  • 21,644
  • 19
  • 100
  • 126
dogbane
  • 266,786
  • 75
  • 396
  • 414
  • 23
    well This is an answer indeed , rather than replacing isn't there a java library or a function to perform the task/? – not 0x12 Apr 22 '13 at 11:59
  • 5
    The plus sign needs to be escaped `t.println(java.net.URLEncoder.encode("Hello World", "UTF-8").replace("\\+", "%20"));` – George Aug 16 '13 at 09:35
  • 28
    @congliu that's incorrect - you're probably thinking of replaceAll() which works with regex - replace() is simple character sequence replacement. – CupawnTae Sep 25 '13 at 13:57
  • 15
    Yes @congliu the good way is : URLEncoder.encode("Myurl", "utf-8").replaceAll("\\+", "%20"); – eento Oct 09 '13 at 17:10
  • 1
    @eento Why the solution provided by `congliu` is incorrect? It's almost the same as yours. – Alston Oct 03 '14 at 11:05
  • 9
    Downvoted because short-sighted solutions like this are dangerous. It's not just about the space character, see the [RFC 3986](https://tools.ietf.org/html/rfc3986) regarding URL encoding. – pyb Sep 23 '16 at 15:48
  • 3
    @pyb I wish I could downvote your comment. The question was specifically about the space character...why everything has to be generalized? In the same vein, if users wish to replace *all* appearances of plus signs to `%20`, this answer wouldn't be 100% precise since they would need to use `String#replaceAll(regex, replacement)`, in this case `"\\+"` would be mandatory but again, this answer effectively answers the precise question made by @dogbane. – Clint Eastwood May 17 '17 at 13:25
  • 18
    @ClintEastwood This answer encourages the use of java.net.URLEncoder which does not the job of what was originally asked. And so this answer suggests a patch, using replace(), on top of it. Why not? Because this solution is bug prone and could lead to 20 other similar questions but with a different character. That's why I said this was shortsighted. – pyb May 17 '17 at 13:31
  • It implements the job of encoding form parameter names and values. Not URLs. – user207421 Sep 15 '17 at 11:46
  • 1
    @pyb I think any answer could lead to something, as any code can have some side-effect that wasn't originally anticipated. Can you point out a relevant problem with code posted in this answer? – eis May 31 '18 at 05:24
  • 2
    @eis As noticed by the asker, `URLEncoder.encode` doesn't do what the asker needs. The code posted in this answer is patching it by calling `String.replace`. This is weak (only works for the space character) and overly complicated: just use the proper encoding. See for instance https://stackoverflow.com/a/31595036/2223027 To use an analogy, if the answer was "Why doesn't `2+2` gives 5?", it's like suggesting "just do `2+2+1` and you'll get 5". – pyb May 31 '18 at 19:00
  • 5
    @pyb it saddens me to see that doing one replace() call is considered more complicated than adding a full-fledged library like guava, which really introduces a lot more complexity to the software. My point was that is there a relevant problem with just replacing the space character? I've yet to see any real-world example where the difference between the two encodings would cause an issue. To my knowledge, the other differences are characters [listed here](https://www.leveluplunch.com/java/examples/encode-url-string/), which are trivial to add to replacement list if needed. – eis Jun 01 '18 at 12:55
  • when I do this I get `Hello%2520World` – user1870400 Jul 13 '18 at 09:33
  • 1
    What if INPUT CONTAINS `+` already ?? ;) – Antoniossss May 19 '22 at 09:55
  • this is wrong answer but 251 upvotes lol. – walv Oct 18 '22 at 21:32
  • @Antoniossss The original `+` becomes `%2B` which will decode right back to a plus sign – TheMadsen Jan 04 '23 at 08:14
76

A space is encoded to %20 in URLs, and to + in forms submitted data (content type application/x-www-form-urlencoded). You need the former.

Using Guava:

dependencies {
     compile 'com.google.guava:guava:23.0'
     // or, for Android:
     compile 'com.google.guava:guava:23.0-android'
}

You can use UrlEscapers:

String encodedString = UrlEscapers.urlFragmentEscaper().escape(inputString);

Don't use String.replace, this would only encode the space. Use a library instead.

pyb
  • 4,813
  • 2
  • 27
  • 45
  • It also works for Android, com.google.guava:guava:22.0-rc1-android. – Bevor May 12 '17 at 19:22
  • 1
    @Bevor rc1 means 1st Release Candidate, i.e. a version not yet approved for general release. If you can, pick a version without snapshot, alpha, beta, rc as they are known to contain bugs. – pyb May 16 '17 at 22:02
  • 1
    @pyb Thanks, but I will update the libs anyway when my project will be finished. Means, I will not go to prod without final versions. And it still takes a lot of weeks, so I guess there is a final version then. – Bevor May 17 '17 at 16:36
  • 1
    Unfortunately, Guava doesn't provide a decoder, unlike Apache's [URLCodec](https://commons.apache.org/proper/commons-codec/apidocs/org/apache/commons/codec/net/URLCodec.html). – Benny Bottema Mar 09 '18 at 10:58
28

This class perform application/x-www-form-urlencoded-type encoding rather than percent encoding, therefore replacing with + is a correct behaviour.

From javadoc:

When encoding a String, the following rules apply:

  • The alphanumeric characters "a" through "z", "A" through "Z" and "0" through "9" remain the same.
  • The special characters ".", "-", "*", and "_" remain the same.
  • The space character " " is converted into a plus sign "+".
  • All other characters are unsafe and are first converted into one or more bytes using some encoding scheme. Then each byte is represented by the 3-character string "%xy", where xy is the two-digit hexadecimal representation of the byte. The recommended encoding scheme to use is UTF-8. However, for compatibility reasons, if an encoding is not specified, then the default encoding of the platform is used.
axtavt
  • 239,438
  • 41
  • 511
  • 482
  • @axtavt Nice explanation. But I still have some questions. In the `url`, the space should be interpreted as `%20`. So we need to do `url.replaceAll("\\+", "%20")`? And if it's javascript, we shouldn't use `escape` function. Use `encodeURI` or `encodeURIComponent` instead. That's what I thought. – Alston Oct 03 '14 at 10:41
  • 2
    @Stallman this is Java, not JavaScript. Totally different languages. – Charles Wood Nov 24 '14 at 22:28
25

Encode Query params

org.apache.commons.httpclient.util.URIUtil
    URIUtil.encodeQuery(input);

OR if you want to escape chars within URI

public static String escapeURIPathParam(String input) {
  StringBuilder resultStr = new StringBuilder();
  for (char ch : input.toCharArray()) {
   if (isUnsafe(ch)) {
    resultStr.append('%');
    resultStr.append(toHex(ch / 16));
    resultStr.append(toHex(ch % 16));
   } else{
    resultStr.append(ch);
   }
  }
  return resultStr.toString();
 }

 private static char toHex(int ch) {
  return (char) (ch < 10 ? '0' + ch : 'A' + ch - 10);
 }

 private static boolean isUnsafe(char ch) {
  if (ch > 128 || ch < 0)
   return true;
  return " %$&+,/:;=?@<>#%".indexOf(ch) >= 0;
 }
fmucar
  • 14,361
  • 2
  • 45
  • 50
11

Hello+World is how a browser will encode form data (application/x-www-form-urlencoded) for a GET request and this is the generally accepted form for the query part of a URI.

http://host/path/?message=Hello+World

If you sent this request to a Java servlet, the servlet would correctly decode the parameter value. Usually the only time there are issues here is if the encoding doesn't match.

Strictly speaking, there is no requirement in the HTTP or URI specs that the query part to be encoded using application/x-www-form-urlencoded key-value pairs; the query part just needs to be in the form the web server accepts. In practice, this is unlikely to be an issue.

It would generally be incorrect to use this encoding for other parts of the URI (the path for example). In that case, you should use the encoding scheme as described in RFC 3986.

http://host/Hello%20World

More here.

Community
  • 1
  • 1
McDowell
  • 107,573
  • 31
  • 204
  • 267
  • So, HOW can people properly encode strings? I cannot find ANY valid solution in this complete post on SO... – Zordid Apr 03 '23 at 10:17
7

If you want to encode URI path components, you can also use standard JDK functions, e.g.

public static String encodeURLPathComponent(String path) {
    try {
        return new URI(null, null, path, null).toASCIIString();
    } catch (URISyntaxException e) {
        // do some error handling
    }
    return "";
}

The URI class can also be used to encode different parts of or whole URIs.

Update: I just realized that this does not work if there is a colon before a slash in the path or when the part before the colon is not valid URI scheme.

MrTux
  • 32,350
  • 30
  • 109
  • 146
6

Just been struggling with this too on Android, managed to stumble upon Uri.encode(String, String) while specific to android (android.net.Uri) might be useful to some.

static String encode(String s, String allow)

https://developer.android.com/reference/android/net/Uri.html#encode(java.lang.String, java.lang.String)

Chrispix
  • 17,941
  • 20
  • 62
  • 70
4

The other answers either present a manual string replacement, URLEncoder which actually encodes for HTML format, Apache's abandoned URIUtil, or using Guava's UrlEscapers. The last one is fine, except it doesn't provide a decoder.

Apache Commons Lang provides the URLCodec, which encodes and decodes according to URL format rfc3986.

String encoded = new URLCodec().encode(str);
String decoded = new URLCodec().decode(str);

If you are already using Spring, you can also opt to use its UriUtils class as well.

Community
  • 1
  • 1
Benny Bottema
  • 11,111
  • 10
  • 71
  • 96
4

Although quite old, nevertheless a quick response:

Spring provides UriUtils - with this you can specify how to encoded and which part is it related from an URI, e.g.

encodePathSegment
encodePort
encodeFragment
encodeUriVariables
....

I use them cause we already using Spring, i.e. no additonal library is required!

LeO
  • 4,238
  • 4
  • 48
  • 88
  • 1
    Is there anything else in Spring that does URL encoding? I ask because when I make a test request using `getForObject` (part of `RestTemplate`) the URL it writes out leaves commas unencoded, but `UriUtils.encode(...)` encodes commas, which means my `MockRestServiceServer` doesn't match the path if I use the output from `UriUtils.encode`. – IpsRich Oct 16 '20 at 10:56
  • I think this answers my question: https://stackoverflow.com/a/20885702/1999993 – IpsRich Oct 16 '20 at 11:58
4

If you are using jetty then org.eclipse.jetty.util.URIUtil will solve the issue.

String encoded_string = URIUtil.encodePath(not_encoded_string).toString();
gourab ghosh
  • 159
  • 6
4

It's not one-liner, but you can use:

URL url = new URL("https://some-host.net/dav/files/selling_Rosetta Stone Case Study.png.aes");
URI uri = new URI(url.getProtocol(), url.getUserInfo(), url.getHost(), url.getPort(), url.getPath(), url.getQuery(), url.getRef());
System.out.println(uri.toString());

This will give you an output:

https://some-host.net/dav/files/selling_Rosetta%20Stone%20Case%20Study.png.aes
tchudyk
  • 564
  • 4
  • 14
  • This works. Also I was able to get URL object back from URI object as I needed an input stream from it. I did it this way `uri.toURL().openStream();` – Oliver Apr 19 '22 at 08:02
3

This worked for me

org.apache.catalina.util.URLEncoder ul = new org.apache.catalina.util.URLEncoder().encode("MY URL");
Hitesh Kumar
  • 643
  • 1
  • 6
  • 22
2

I was already using Feign so UriUtils was available to me but Spring UrlUtils was not.

<!-- https://mvnrepository.com/artifact/io.github.openfeign/feign-core -->
<dependency>
    <groupId>io.github.openfeign</groupId>
    <artifactId>feign-core</artifactId>
    <version>11.8</version>
</dependency>

My Feign test code:

import feign.template.UriUtils;

System.out.println(UriUtils.encode("Hello World"));

Outputs:

Hello%20World

As the class suggests, it encodes URIs and not URLs but the OP asked about URIs and not URLs.

System.out.println(UriUtils.encode("https://some-host.net/dav/files/selling_Rosetta Stone Case Study.png.aes"));

Outputs:

https%3A%2F%2Fsome-host.net%2Fdav%2Ffiles%2Fselling_Rosetta%20Stone%20Case%20Study.png.aes

rjdkolb
  • 10,377
  • 11
  • 69
  • 89
  • Even feign FAILS to properly encode any arbitrary string, as they - for if you ask my opinion stupid reasons - coded their functions to "skip already encoded strings". Meaning you cannot use their function for strings like "%07"... :( – Zordid Apr 03 '23 at 10:14
0

"+" is correct. If you really need %20, then replace the Plusses yourself afterwards.

Warning: This answer is heavily disputed (+8 vs. -6), so take this with a grain of salt.

Daniel
  • 27,718
  • 20
  • 89
  • 133
  • 7
    There may be a problem if the initial string really contained a + character. – Alexis Dufrenoy Jun 11 '13 at 15:29
  • 22
    @Traroth - Not really. A `+` character in the original text is supposed to be encoded as `%2B`. – Ted Hopp Aug 19 '13 at 21:22
  • saying that `+` is correct without knowing the context is, at least, pedantic. Downvoted. Read other answers to know about when + or %20 are to be used. – Clint Eastwood May 17 '17 at 13:47
  • @ClintEastwood: Can you tell me about any usecase in that the + character for spaces isn't correct in URLs? Except when there is a non-conforming URL parser on the other side? – Daniel May 18 '17 at 19:31
  • @Daniel sure, not saying "incorrect" but unsuitable? yes. Analytics tools often use query params with values separated by a certain character, for example "+". In that case, using "+" instead of "%20" would be wrong. "+" is used for escaping spaces in a form, while the "percentage encoding" (a.k.a. URL encoding) is more oriented to URLs. – Clint Eastwood May 19 '17 at 17:38
0

Try below approach:

Add a new dependency

<!-- https://mvnrepository.com/artifact/org.apache.tomcat/tomcat-catalina -->
<dependency>
    <groupId>org.apache.tomcat</groupId>
    <artifactId>tomcat-catalina</artifactId>
    <version>10.0.13</version>
</dependency>

Now do as follows:

String str = "Hello+World"; // For "Hello World", decoder is not required
// import java.net.URLDecoder;
String newURL = URLDecoder.decode(str, StandardCharsets.UTF_8);
// import org.apache.catalina.util.URLEncoder;
System.out.println(URLEncoder.DEFAULT.encode(newURL, StandardCharsets.UTF_8));

You'll get the output as:

Hello%20World
-1

USE MyUrlEncode.URLencoding(String url , String enc) to handle the problem

    public class MyUrlEncode {
    static BitSet dontNeedEncoding = null;
    static final int caseDiff = ('a' - 'A');
    static {
        dontNeedEncoding = new BitSet(256);
        int i;
        for (i = 'a'; i <= 'z'; i++) {
            dontNeedEncoding.set(i);
        }
        for (i = 'A'; i <= 'Z'; i++) {
            dontNeedEncoding.set(i);
        }
        for (i = '0'; i <= '9'; i++) {
            dontNeedEncoding.set(i);
        }
        dontNeedEncoding.set('-');
        dontNeedEncoding.set('_');
        dontNeedEncoding.set('.');
        dontNeedEncoding.set('*');
        dontNeedEncoding.set('&');
        dontNeedEncoding.set('=');
    }
    public static String char2Unicode(char c) {
        if(dontNeedEncoding.get(c)) {
            return String.valueOf(c);
        }
        StringBuffer resultBuffer = new StringBuffer();
        resultBuffer.append("%");
        char ch = Character.forDigit((c >> 4) & 0xF, 16);
            if (Character.isLetter(ch)) {
            ch -= caseDiff;
        }
        resultBuffer.append(ch);
            ch = Character.forDigit(c & 0xF, 16);
            if (Character.isLetter(ch)) {
            ch -= caseDiff;
        }
         resultBuffer.append(ch);
        return resultBuffer.toString();
    }
    private static String URLEncoding(String url,String enc) throws UnsupportedEncodingException {
        StringBuffer stringBuffer = new StringBuffer();
        if(!dontNeedEncoding.get('/')) {
            dontNeedEncoding.set('/');
        }
        if(!dontNeedEncoding.get(':')) {
            dontNeedEncoding.set(':');
        }
        byte [] buff = url.getBytes(enc);
        for (int i = 0; i < buff.length; i++) {
            stringBuffer.append(char2Unicode((char)buff[i]));
        }
        return stringBuffer.toString();
    }
    private static String URIEncoding(String uri , String enc) throws UnsupportedEncodingException { //对请求参数进行编码
        StringBuffer stringBuffer = new StringBuffer();
        if(dontNeedEncoding.get('/')) {
            dontNeedEncoding.clear('/');
        }
        if(dontNeedEncoding.get(':')) {
            dontNeedEncoding.clear(':');
        }
        byte [] buff = uri.getBytes(enc);
        for (int i = 0; i < buff.length; i++) {
            stringBuffer.append(char2Unicode((char)buff[i]));
        }
        return stringBuffer.toString();
    }

    public static String URLencoding(String url , String enc) throws UnsupportedEncodingException {
        int index = url.indexOf('?');
        StringBuffer result = new StringBuffer();
        if(index == -1) {
            result.append(URLEncoding(url, enc));
        }else {
            result.append(URLEncoding(url.substring(0 , index),enc));
            result.append("?");
            result.append(URIEncoding(url.substring(index+1),enc));
        }
        return result.toString();
    }

}
IloveIniesta
  • 342
  • 4
  • 20
-1

Check out the java.net.URI class.

Fredrik Widerberg
  • 3,068
  • 10
  • 30
  • 42
-2

Am I using the wrong method? What is the correct method I should be using?

Yes, this method java.net.URLEncoder.encode wasn't made for converting " " to "20%" according to spec (source).

The space character " " is converted into a plus sign "+".

Even this is not the correct method, you can modify this to: System.out.println(java.net.URLEncoder.encode("Hello World", "UTF-8").replaceAll("\\+", "%20"));have a nice day =).

Pregunton
  • 77
  • 7
  • You are suggesting to use a method that is not adequate (`URLEncoder.encode`) and patch it using `replaceAll` which would only work in this specific case. Use the correct class and method instead, see other answers. – pyb Aug 02 '17 at 21:10
  • @pyb looks like you can't understand what I've written. I've never said "I suggest using it", I said "you can". Please read and understand before you write. – Pregunton Aug 21 '17 at 20:56
  • 1
    This is a questions and answers website, not a regular message board where people chat. If you have side comments, use the comments. Longer talk? Use the chat. Don't post code you disagree with as an answer. Please read and understand the rules of this site before contributing and lecturing others. – pyb Aug 21 '17 at 22:10
  • 1
    I'm upvoting it back because most other solutions provide the same advice. No "specific cases" were provided to prove this method wrong. Using apache commons with try-catch blocks or dependencies is too much of a hassle for a method that can be effectively patched with replaceAll. – Eugene Kartoyev Jul 15 '18 at 23:21
-8

use character-set "ISO-8859-1" for URLEncoder

j0k
  • 22,600
  • 28
  • 79
  • 90