I got a strange issue with wrong URI Encoding and would appreciate any help!
The project uses JSPs, Servlets, Jquery, Tomcat 6.
Charset in the JSPs is set to UTF-8, all Tomcat connectors use URIEncoding=UTF-8 and I also use a character encoding filter as described here. Also, I set the contentType in the meta Tag and my browser detects it correctly.
In Ajax calls with Jquery I use encodeURIComponent() on the terms I want to use as URL Parameters and then serialize the whole parameter set with $.param(). In the called servlet these parameters are decoded correctly with Java.net.URLDecoder.decode(term, "UTF-8").
In some places I generate URLs for href elements from a parameter map in the JSPs. Each parameter value is encoded with Java.net.URLEncoder.encode(value, "UTF-8") on JSP side but then decoding it the same way as before results in broken special characters. Instead, I have to encode it as "ISO-8859-2" in the JSP which is then decoded correctly as "UTF-8" in the servlet.
An example for clarifying: The term "überfall" is URIEncoded via Javascript (%C3%BCberfall) and sent to the servlet for decoding and processing, which works. After passing it back to a JSP I would encode it as UTF-8 and build the URL which results for instance in:
<a href="/myWebapp/servletPath?term=%C3%BCberfall">Click here</a>
However, clicking this link will send the parameter as "%C3%83%C2%BCberfall" to the servlet which decodes to "überfall". The same occurs when no encoding takes place.
When, using "ISO-8859-2" for encoding I get:
<a href="/myWebapp/servletPath?term=%FCberfall">Click here</a>
When clicking this link I can observe in Wireshark that %C3%BCberfall is sent as parameter which decodes again to "überfall"!
Can anyone tell me where I miss something?
EDIT: While observing the Network Tab in Firebug I realized that by using
$.param({term : encodeURIComponent(term)});
the term is UTF-8 encoded twice, resulting in "%25C3%25BCberfall", i.e. the percent symbols are also percent-encoded. Analogously, it works for me if I call encode(term, "UTF-8") twice on each value from the parameter map.
Encoding once and not decoding the String results in "überfall" again.