I've been experimenting with various bits of Java code trying to come up with something that will encode a string containing quotes, spaces and "exotic" Unicode characters and produce output that's identical to JavaScript's encodeURIComponent function.
My torture test string is: "A" B ± "
If I enter the following JavaScript statement in Firebug:
encodeURIComponent('"A" B ± "');
—Then I get:
"%22A%22%20B%20%C2%B1%20%22"
Here's my little test Java program:
import java.io.UnsupportedEncodingException;
import java.net.URLEncoder;
public class EncodingTest
{
public static void main(String[] args) throws UnsupportedEncodingException
{
String s = "\"A\" B ± \"";
System.out.println("URLEncoder.encode returns "
+ URLEncoder.encode(s, "UTF-8"));
System.out.println("getBytes returns "
+ new String(s.getBytes("UTF-8"), "ISO-8859-1"));
}
}
—This program outputs:
URLEncoder.encode returns %22A%22+B+%C2%B1+%22 getBytes returns "A" B ± "
Close, but no cigar! What is the best way of encoding a UTF-8 string using Java so that it produces the same output as JavaScript's encodeURIComponent
?
EDIT: I'm using Java 1.4 moving to Java 5 shortly.