0

I would like to encode a string to utf-8, the code is:

URLEncoder.encode("http://www.example.com/sf?s=191ae04f&an=马赛克.jpg","UTF-8");

and the result is:

http%3A%2F%2Fwww.example.com%2Fsf%3Fs%3D191ae04f%26an%3D%C2%ED%C8%FC%BF%CB.jpg

as you can see, the punctuation characters,such as : / ? &, have been encoded in utf-8, but what result I would like to see is:

http://www.example.com/sf?s=191ae04f&an=%C2%ED%C8%FC%BF%CB.jpg 

Is there anything wrong?

Kenster
  • 23,465
  • 21
  • 80
  • 106
ctsu
  • 213
  • 1
  • 3
  • 6

4 Answers4

4

You need to URL-encode only the individual components of the URL, such as the query string parameter names/values which may contain characters beyond the ASCII range, not the entire URL.

String an = URLEncoder.encode("马赛克.jpg", "UTF-8");
String url = "http://www.example.com/sf?s=191ae04f&an=" + an;
// ...
Kenster
  • 23,465
  • 21
  • 80
  • 106
BalusC
  • 1,082,665
  • 372
  • 3,610
  • 3,555
  • Thanx,i had known that spliting the string would make it, but this way will impose overhead in my application, so, is there any way to convert the entire url into utf-8 without encoding the punctuation characters? – ctsu Mar 13 '12 at 12:51
  • Parse the URL then. First split on `?`. The left side is the scheme+domain+path. The right side is the query string. The query string is in turn further parseable by splitting on `&`. Every part is the individual parameter `name=value` pair. This is in turn further parseable by splitting on `=`. The left side is the name and the right side is the value. Now you can URL-encode the individual names and values. Finally just glue all the parts together again into a new URL. – BalusC Mar 13 '12 at 13:03
1

You URL-encoded the whole string, so that you can include it in an URL, for example:

http://www.yyy.com?forward=http%3A%2F%2Fwww.xxx.com%2Fsf%3Fs%3D191ae04f%26an%3D%C2%ED%C8%FC%BF%CB.jpg

However, what you seem to want is to encode only the parameter values of your original URL. So you have to split the URL, URL-encode only the parameter values, and put it back together again.

michael667
  • 3,241
  • 24
  • 32
  • Thanx,i had known that spliting the string would make it, but this way will impose overhead in my application, so, is there any way to convert the entire url into utf-8 without encoding the punctuation characters? – ctsu Mar 13 '12 at 12:55
  • See here: http://stackoverflow.com/questions/444112/how-do-i-encode-uri-parameter-values – michael667 Mar 13 '12 at 13:01
0

As answered in Java - encode URL, you can use something like

public URL parseUrl(String s) throws Exception {
     URL u = new URL(s);
     return new URI(
            u.getProtocol(), 
            u.getAuthority(), 
            u.getPath(),
            u.getQuery(), 
            u.getRef()).
            toURL();
}

The reason is that different parts of the URL needs to be encoded differently.

Alas in your case, the URLEncoder should only be applied to the value of your query parameter.

Community
  • 1
  • 1
Johan Sjöberg
  • 47,929
  • 21
  • 130
  • 148
-1

First thing is you have to encode only the path component of the URL.

The following characters are reserverd characters in URI as per URI specification. Thus URLEncode will escape those characters.

":" / "/" / "?" / "#" / "[" / "]" / "@"

Reference:

URI Reserved Characters

Community
  • 1
  • 1
Ramesh PVK
  • 15,200
  • 2
  • 46
  • 50