2

I am using the URLUTF8Encoder.java class from W3C (www.w3.org/International/URLUTF8Encoder.java).

Currently, it will encode any blank spaces ' ' into plus signs '+'.

I am having difficulty modifying the code to percent-encode the blank space into '%20'. Unfortunately, I am not too familiar with hex. Can anyone help me out? I need to modify this snippet...

else if (ch == ' ') { // space
                sbuf.append('+');

in the following code:

final static String[] hex = { "%00", "%01", "%02", "%03", "%04", "%05",
            "%06", "%07", "%08", "%09", "%0A", "%0B", "%0C", "%0D", "%0E",
            "%0F", "%10", "%11", "%12", "%13", "%14", "%15", "%16", "%17",
            "%18", "%19", "%1A", "%1B", "%1C", "%1D", "%1E", "%1F", "%20",
            "%21", "%22", "%23", "%24", "%25", "%26", "%27", "%28", "%29",
            "%2A", "%2B", "%2C", "%2D", "%2E", "%2F", "%30", "%31", "%32",
            "%33", "%34", "%35", "%36", "%37", "%38", "%39", "%3A", "%3B",
            "%3C", "%3D", "%3E", "%3F", "%40", "%41", "%42", "%43", "%44",
            "%45", "%46", "%47", "%48", "%49", "%4A", "%4B", "%4C", "%4D",
            "%4E", "%4F", "%50", "%51", "%52", "%53", "%54", "%55", "%56",
            "%57", "%58", "%59", "%5A", "%5B", "%5C", "%5D", "%5E", "%5F",
            "%60", "%61", "%62", "%63", "%64", "%65", "%66", "%67", "%68",
            "%69", "%6A", "%6B", "%6C", "%6D", "%6E", "%6F", "%70", "%71",
            "%72", "%73", "%74", "%75", "%76", "%77", "%78", "%79", "%7A",
            "%7B", "%7C", "%7D", "%7E", "%7F", "%80", "%81", "%82", "%83",
            "%84", "%85", "%86", "%87", "%88", "%89", "%8A", "%8B", "%8C",
            "%8D", "%8E", "%8F", "%90", "%91", "%92", "%93", "%94", "%95",
            "%96", "%97", "%98", "%99", "%9A", "%9B", "%9C", "%9D", "%9E",
            "%9F", "%A0", "%A1", "%A2", "%A3", "%A4", "%A5", "%A6", "%A7",
            "%A8", "%A9", "%AA", "%AB", "%AC", "%AD", "%AE", "%AF", "%B0",
            "%B1", "%B2", "%B3", "%B4", "%B5", "%B6", "%B7", "%B8", "%B9",
            "%BA", "%BB", "%BC", "%BD", "%BE", "%BF", "%C0", "%C1", "%C2",
            "%C3", "%C4", "%C5", "%C6", "%C7", "%C8", "%C9", "%CA", "%CB",
            "%CC", "%CD", "%CE", "%CF", "%D0", "%D1", "%D2", "%D3", "%D4",
            "%D5", "%D6", "%D7", "%D8", "%D9", "%DA", "%DB", "%DC", "%DD",
            "%DE", "%DF", "%E0", "%E1", "%E2", "%E3", "%E4", "%E5", "%E6",
            "%E7", "%E8", "%E9", "%EA", "%EB", "%EC", "%ED", "%EE", "%EF",
            "%F0", "%F1", "%F2", "%F3", "%F4", "%F5", "%F6", "%F7", "%F8",
            "%F9", "%FA", "%FB", "%FC", "%FD", "%FE", "%FF" };

public static String encode(String s) {
        StringBuffer sbuf = new StringBuffer();
        int len = s.length();
        for (int i = 0; i < len; i++) {
            int ch = s.charAt(i);
            if ('A' <= ch && ch <= 'Z') { // 'A'..'Z'
                sbuf.append((char) ch);
            } else if ('a' <= ch && ch <= 'z') { // 'a'..'z'
                sbuf.append((char) ch);
            } else if ('0' <= ch && ch <= '9') { // '0'..'9'
                sbuf.append((char) ch);
            } else if (ch == ' ') { // space
                sbuf.append('+');
            } else if (ch == '-'
                    || ch == '_' // unreserved
                    || ch == '.' || ch == '!' || ch == '~' || ch == '*'
                    || ch == '\'' || ch == '(' || ch == ')') {
                sbuf.append((char) ch);
            } else if (ch <= 0x007f) { // other ASCII
                sbuf.append(hex[ch]);
            } else if (ch <= 0x07FF) { // non-ASCII <= 0x7FF
                sbuf.append(hex[0xc0 | (ch >> 6)]);
                sbuf.append(hex[0x80 | (ch & 0x3F)]);
            } else { // 0x7FF < ch <= 0xFFFF
                sbuf.append(hex[0xe0 | (ch >> 12)]);
                sbuf.append(hex[0x80 | ((ch >> 6) & 0x3F)]);
                sbuf.append(hex[0x80 | (ch & 0x3F)]);
            }
        }
        return sbuf.toString();
    }

Thanks!

Zach Saucier
  • 24,871
  • 12
  • 85
  • 147
littleK
  • 19,521
  • 30
  • 128
  • 188
  • why do you need the + to be %20? they are both equivalent? http://www.permadi.com/tutorial/urlEncoding/ –  Mar 26 '10 at 17:53
  • Please see my response below, thanks. – littleK Mar 26 '10 at 18:02
  • 1
    @fuzzy lollipop: Alas, no. HTTP says it should be `%20`, it's the HTML specification that allows + instead of space. So, http://www.example.com/something%20here.php?q=a+string+with+spaces is valid, but http://www.example.com/something+here.php?q=a+string+with+spaces is not. – Powerlord Mar 26 '10 at 18:26

6 Answers6

5

You might want to check out Apache Common's codec package, it's probably a lot more robust : http://commons.apache.org/codec/ - The package you're using is about 14 years old and only encodes into one type of encoding (www-url-form-encoded) - which REQUIRES spaces to be encoded as '+'. If you're trying do do standard URL encoding (which wants spaces as %20), you'll need to use a different package entirely.

Kylar
  • 8,876
  • 8
  • 41
  • 75
4

Why are you using this class instead of the API method?

java.net.URLEncoder.encode("your string", "utf-8");

And why is it a problem that spaces are encoded as + characters? That is exactly how URL safe character encoding is supposed to work.

jarnbjo
  • 33,923
  • 7
  • 70
  • 94
  • I am developing on a BlackBerry platform, which does not include the java.net API, unfortunately. The problem is that when I form a URL to make a request, ie: http://api.netflix.com/titles?more_stuff&term=Forrest+Gump it will not work unless it looks like this Forrest%20Gump (according to the Netflix API that I am using) – littleK Mar 26 '10 at 18:01
  • 1
    So why don't you simply remove the special if branch for spaces (`else if (ch == ' ') { sbuf.append('+'); }`) in the code you've pasted? In that case, spaces should fall into the "other ASCII" branch and be encoded as you expect. – jarnbjo Mar 26 '10 at 18:07
  • 4
    "And why is it a problem that spaces are encoded as + characters? That is exactly how URL safe character encoding is supposed to work." HTML allows + for spaces, HTTP requires %20 for spaces. – Powerlord Mar 26 '10 at 18:29
  • Powerlord - for some reason the Netflix API doesn't like the "+" symbol – littleK Mar 26 '10 at 19:17
3

I won't ask why you're doing this, and just answer your question directly. Please read other answers to determine if you really want to be modifying this code. If you just remove the code:

else if (ch == ' ') { // space
   sbuf.append('+');
} 

It will do what you want, because the space character will be taken care of by the code:

else if (ch <= 0x007f) { // other ASCII
   sbuf.append(hex[ch]);
} 
Rob Heiser
  • 2,792
  • 1
  • 21
  • 28
  • That did the trick, thanks very much for your help. I am not crazy about using such an old class, but like I mentioned in another post, I have no other choice (as I am developing on a BlackBerry platform) Thanks! – littleK Mar 26 '10 at 19:16
1

Just do this:

String str = "Hello World+You";
String encodedStr = URLEncoder.encode(str, "UTF-8");
encodedStr = encodedStr.replace("+", "%20");
System.out.println("Encoded String: " + encodedStr);
Chandan
  • 3,349
  • 2
  • 22
  • 18
  • 1
    I wonder, could there be + in some meaning other than a space? Anyway, this rather seems like a fugly hack than a clean solution. – Vlasec Jul 02 '13 at 11:37
  • It is slow, it includes unnecessary memory allocations (can easy become hot code when processing lots of data). Regex Pattern is created from "+" implicitely by replace() - should be pre-compiled if URL-encoding in a loop. Requires handling of UnsupportedEncodingException (nonsensical since UTF-8 is used) – Nishi Mar 07 '16 at 17:07
0

You can use the built-in java.net.URI class, which is normally used via it's static builder as URI.create("http://example.com/search?param=42") but in case when a parameter contains literal space you can use it as:

URI uri = new URI("http", // scheme
    null,                 // user authentication info
    "example.com",        // domain
    -1,                   // port (use -1 for default port 80)
    "/search",            // path
    "param=four and two", // one or more parameters
    null);                // fragment (appended with the # char)
System.out.println(uri)
// OUTPUT:
// http://example.com/search?param=four%20and%20two

If you look inside this particular URI constructor you'll see that -1 can be used to specify the default port (80); explicitly passing 80 as constructor value will create a URL like http://example.com:80/search?param=four%20and%20two which you probably do not want.

The same constructor can be used to build only the query part of the URL which you can append to an existing string:

URI uri2 = new URI(null, null, null, -1, null, "param=four and two", null);
System.out.println(uri2)
// OUTPUT:
// ?param=four%20and%20two

Might be worth mentioning that a URI is not the same as a URL: file:/// is a valid URI but not a valid URL.

ccpizza
  • 28,968
  • 18
  • 162
  • 169
-1

It's working correctly; it should work with + as well as it would with %20.

Maybe try java.net.URLEncoder("url", "UTF-8")?

Dean J
  • 39,360
  • 16
  • 67
  • 93