This is frustrating (to put it mildly) with servlets. The standard URL encoding must use UTF-8 yet servlets not only default to ISO-8859-1 but don't offer any way to change that with code.
Sure you can req.setRequestEncoding("UTF-8")
before you read anything, but for some ungodly reason this only affects request body, not query string parameters. There is nothing in the servlet request interface to specify the encoding used for query string parameters.
Using ISO-8859-1
in your form is a hack. Using this ancient encoding will cause more problems than solve for sure. Especially since browsers do not support ISO-8859-1 and always treat it as Windows-1252. Whereas servlets treat ISO-8859-1 as ISO-8859-1, so you will be screwed beyond belief if you go with this.
To change this in Tomcat for example, you can use the URIEncoding
attribute in your <connector>
element:
<connector ... URIEncoding="UTF-8" ... />
If you don't use a container that has these settings, can't change its settings or some other issue, you can still make it work because ISO-8859-1 decoding retains full information from the original binary.
String correct = new String(request.getParameter("test").getBytes("ISO-8859-1"), "UTF-8")
So let's say test=ä
and if everything is correctly set, the browser encodes it as test=%C3%A4
. Your servlet will incorrectly decode it as ISO-8859-1 and give you the resulting string "ä"
. If you apply the correction, you can get ä
back:
System.out.println(new String("ä".getBytes("ISO-8859-1"), "UTF-8").equals("ä"));
//true