29

I have some problem with UTF-8. My client (realized in GWT) make a request to my servlet, with some parametres in the URL, as follow:

http://localhost:8080/servlet?param=value

When in the servlet I retrieve the URL, I have some problem with UTF-8 characters. I use this code:

protected void service(HttpServletRequest request, HttpServletResponse response) 
                    throws ServletException, IOException {

        request.setCharacterEncoding("UTF-8");

        String reqUrl = request.getRequestURL().toString(); 
        String queryString = request.getQueryString();
        System.out.println("Request: "+reqUrl + "?" + queryString);
        ...

So, if I call this url:

http://localhost:8080/servlet?param=così

the result is like this:

Request: http://localhost:8080/servlet?param=cos%C3%AC

What can I do to set up properly the character encoding?

BalusC
  • 1,082,665
  • 372
  • 3,610
  • 3,555
Gabriele
  • 446
  • 1
  • 7
  • 19

5 Answers5

30

From the HttpServletRequest#getQueryString() javadoc:

Returns: a String containing the query string or null if the URL contains no query string. The value is not decoded by the container.

Note the last statement. So you need to URL-decode it youself using java.net.URLDecoder.

String queryString = URLDecoder.decode(request.getQueryString(), "UTF-8");

However, the normal way to gather parameters is just using HttpServletRequest#getParameter().

String param = request.getParameter("param"); // così

The servletcontainer has already URL-decoded it for you then if you have configured it to use the correct encoding. The request.setCharacterEncoding() has only effect on the request body (POST) not on the request URI (GET). Also see Mirage's answer.

reevesy
  • 3,452
  • 1
  • 26
  • 23
BalusC
  • 1,082,665
  • 372
  • 3,610
  • 3,555
  • if I use the URLDecoder they work, but when I want to retrieve only the parameter with getParameter(), they don't work... any suggestion? – Gabriele Jun 12 '10 at 17:08
  • 1
    You need to set the server URI encoding as Mirage114 explains. Also see [this article](http://balusc.blogspot.com/2009/05/unicode-how-to-get-characters-right.html#JSPServletRequest) – BalusC Jun 12 '10 at 17:13
  • 1
    so `getParameter()` does call `URLDecoder.decode()` someplace ? – Mr_and_Mrs_D Oct 06 '12 at 22:12
  • This works good, but I've noticed '+' becomes ' ' after decoding, we might have to `URLDecoder.decode(request.getQueryString(), "UTF-8").replaceAll(" ", "+")`. There might be other characters also? – Anand Rockzz Jan 13 '23 at 18:13
  • @Anand: That's indeed expected behavior. If this is not what you expected then you either need to adjust your expectations or to look for the solution in a different direction (XY-problem perhaps?). – BalusC Jan 13 '23 at 18:14
28

I've run into this same problem before. Not sure what Java servlet container you're using, but at least in Tomcat 5.x (not sure about 6.x) the request.setCharacterEncoding() method doesn't really have an effect on GET parameters. By the time your servlet runs, GET parameters have already been decoded by Tomcat, so setCharacterEncoding won't do anything.

Two ways to get around this:

  1. Change the URIEncoding setting for your connector to UTF-8. See http://tomcat.apache.org/tomcat-5.5-doc/config/http.html.

  2. As BalusC suggests, decode the query string yourself, and manually parse it (as opposed to using the ServletRequest APIs) into a parameter map yourself.

Hope this helps!

schematic
  • 1,964
  • 1
  • 16
  • 20
  • 4
    The URIEncoding setting in #1 is in Tomcat's server.xml. Other servlet containers should reasonably have the same kind of setting. – schematic Jun 12 '10 at 17:10
  • 1
    For #2, you can't use the request.getParameter() method anymore, because that method retrieves parameters that have been incorrectly decoded. You have to take the decoded query string (produced from getQueryString()) and parse it manually (e.g. split the string by ampersand characters '&' then split the resulting strings by the first equal sign '='). – schematic Jun 12 '10 at 17:13
  • 2
    I ran into a problem with the server.xml setting. On windows machines it worked correctly, but on our production Red Hat based machines Tomcat appeared to ignore the server.xml setting. We ended up having to implement our own query parameter parser that explicitly decoded it using UTF-8. – Herms Jun 12 '10 at 17:29
  • This is one of the many places where Java's over-reliance on the ‘default encoding’ causes heavy breakage. The encoding you want in URLs is almost always UTF-8, and almost never the server's default encoding. – bobince Jun 12 '10 at 17:37
20

It really took all day but :

final String param = new String(request.getParameter("param").getBytes(
                "iso-8859-1"), "UTF-8");

See also here. Note that this is valid iff the decoding charset (URIEncoding in tomcat) of the server is iso-8859-1 - otherwise this charset must be passed in. For an example of how to get the URIEncoding charset from the server.xml for Tomcat 7 see my quoted answer

Community
  • 1
  • 1
Mr_and_Mrs_D
  • 32,208
  • 39
  • 178
  • 361
  • This is relying on the server's default charset being UTF-8; instead pass that charset into the string constructor. Also you don't need to URL-decode anything that has come out of 'getParameter'. – bobince Jun 20 '13 at 14:12
  • @bobince : you are very right (and I knew it) - I still hadn't found the time to go through my answers - edited – Mr_and_Mrs_D Jun 20 '13 at 14:25
  • It's a creative solution, but the problem, as you say, is that it relies on knowing the container's URIEncoding. If you have control over that, you should just change it to UTF-8. In the more likely case that you don't know a priori what the encoding will be, your solution will probably lead to more headaches than anything else. – thomas88wp Jun 03 '15 at 14:35
  • @thomas88wp: You did not follow the link did you ? – Mr_and_Mrs_D Jun 03 '15 at 16:13
  • @Mr_and_Mrs_D, yes I've seen this post, but wouldn't you agree it's a bit of a protracted solution? If your code has an abundance of container-specific logic, it just doesn't seem like an ideal solution. And I'm not faulting you, since you give fair warning. I'm just concerned that the average person is going to see your 2 lines of code, test on their local machine, see that it works, and ignore said warning (as I almost did). Again, none of this is to criticize your answer. – thomas88wp Jun 04 '15 at 16:37
4

For POST request I resolved the problem next way.

  1. Set URIEncoding="UTF-8" attr in server.xml for Connector; (I use Tomcat 8)
  2. Set request.setCharacterEncoding("UTF-8") before parameters retrieving.

Finally, I have got correct utf-8 characters deliery:

e.g.

String name = request.getParameter("name");

name contains correct utf-8 string.

Dhwanil Patel
  • 2,273
  • 1
  • 18
  • 28
Alexander Drobyshevsky
  • 3,907
  • 2
  • 20
  • 17
1

There are many factors affect to http request params encoding. you can reference sequencial guide for this problem.

1.check your form's accept character encoding.

<form id="edit-box" name="edit-box-name" method="post" accept-charset="UTF-8">

2.check http server's default character encoding value. In the case of apache http server, add "AddDefaultCharset UTF-8" string to httpd.conf file.

3.if you have back end server, check backend server's character encoding value. In the case of tomcat backend server, add "URIEncoding="UTF-8" attribute to your Connector. like,

<Connector port="8080" protocol="HTTP/1.1" connectionTimeout="20000"  redirectPort="8443" URIEncoding="UTF-8"/>

...

guide for http request parameter encoding problems

nominor
  • 11
  • 2