25

I am trying to read UTF-8 info from the request. I used "request.setCharacterEncoding("UTF-8");", but it seems to do nothing - the info read is non UTF-8.

What am i doing wrong?

BalusC
  • 1,082,665
  • 372
  • 3,610
  • 3,555
Erik Sapir
  • 23,209
  • 28
  • 81
  • 141

8 Answers8

24

If you are using tomcat, you should also set the URIEncoding to UTF-8 in your connectors:

<Server port="8105" shutdown="SHUTDOWN">
...
    <Service name="Catalina">
        <Connector port="8180" URIEncoding="UTF-8" />
        <Engine name="Catalina" defaultHost="localhost">
            <Host name="localhost" appBase="webapps" />
        </Engine>
    </Service>
</Server>
Maurice Perry
  • 32,610
  • 9
  • 70
  • 97
  • I think he's trying to read request data and does not know how to decode it right. This flag does not change how request data is encoded, it tells the server how URIs (URLs) are to be encoded. – Sylar Jul 19 '10 at 07:20
  • Actually, its tells tomcat to use UTF-8 when decoding urls sent by the browser; if you do not specify it, it will use ISO-8859-1. If a URL contains form parameters, they will not be decoded correctly. – Maurice Perry Jul 19 '10 at 08:33
  • any non-container specific solution? I have same issue with jetty. – Jus12 Sep 23 '13 at 00:06
  • sorry, the solution is jetty-specific: http://wiki.eclipse.org/Jetty/Howto/International_Characters#International_characters_in_URLs – Maurice Perry Sep 23 '13 at 07:27
19

The HttpServletRequest#setCharacterEncoding() has only effect when the request is a POST request and the request body is not processed yet.

So if it doesn't work in your case, then it can have two causes:

  1. You're actually firing a GET request. I.e. the request parameters are sent from client to server in the request URL instead of the request body. The request URL is processed by the webserver, not by the Servlet API. So, to fix this, you need to configure the webserver in question to decode the request URL (URI) using the specified character encoding. In case of for example Apache Tomcat, you need to set the URIEncoding attribute of the <Connector> element in server.xml to UTF-8.

  2. You're correctly using POST, but you've already (indirectly) processed the request body so that it's too late to change the character encoding. The request body will be fully processed only whenever the first call on a getParameterXXX() method is made. There are several of them. It won't be re-processed on subsequent calls. When nailing down who's calling this method, don't forget to take all declared Filter instances in web.xml into account. Some of them might grab and scan the parameters.

If that still doesn't help anything, then the only possible cause left is that the display console or logger or whatever you're using to print/determine/debug the obtained request parameter does not support UTF-8. You'd like to reconfigure the console/logger/etc to use UTF-8 instead to display the characters. If it's for example the Eclipse console, then you can set it by Window > Preferences > General > Workspace > Text File Encoding.

See also:

Community
  • 1
  • 1
BalusC
  • 1,082,665
  • 372
  • 3,610
  • 3,555
  • Point 2 was the reason for me , you nailed it! thanks – A.Alqadomi May 31 '17 at 08:38
  • Docs say that: `Overrides the name of the character encoding used in the body of this request.`. Does it mean it changes character encoding of the body, or only changes http header for the request – valijon Oct 19 '18 at 09:04
5

this method is really stupid. it shouldn't be there, and you shouldn't use it.

for a body in a POST request, the encoding should have been explicitly defined by the client in the Content-Type header. if not, it's a bad request. [1]

for a GET request URI, the client cannot specify encoding, and the server must have an implicit encoding, and the programmer needs to set the encoding, yet that method does not exist in Servlet API!

however, you servlet container could have a proprietary way of doing that.

the best way is probably set the default encoding of your JVM to UTF-8.

1: http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html#sec3.7.1

The "charset" parameter is used with some media types to define the character set (section 3.4) of the data. When no explicit charset parameter is provided by the sender, media subtypes of the "text" type are defined to have a default charset value of "ISO-8859-1" when received via HTTP. Data in character sets other than "ISO-8859-1" or its subsets MUST be labeled with an appropriate charset value.

irreputable
  • 44,725
  • 9
  • 65
  • 93
  • 3
    Tell to the clients who are responsible for sending the header and/or to the inventor of the HTTP spec that sending the encoding along the content type header is mandatory. – BalusC Jul 19 '10 at 16:14
2

The problem is dependent on which application server is used. The best description, which I found in this link.

In some application servers the request.setCharacterEncoding(...) has no effect until you set the application encoding using a descriptor. The most complicated are JBoss, Apache Tomcat, Glassfish. Better is WebLogic, the best is Jetty (UTF-8 is default setting).

In my case I must create a glassfish-web.xml descriptor and put there the parameter-encoding tag. In my case, for GlassFish:

<glassfish-web-app error-url="">
    <!-- request.setCharacterEncoding("UTF-8") not functioning without this setting-->
    <parameter-encoding default-charset="UTF-8" />
</glassfish-web-app>
cassiomolin
  • 124,154
  • 35
  • 280
  • 359
hariprasad
  • 555
  • 11
  • 20
  • Thank you, that worked with glassfish 4.1. Note that the user needs to add request.setCharacterEncoding too - the parameter is not enough. – Panayotis Mar 24 '15 at 10:23
1

are you doing it after any request.getParameter call.

request.setCharacterEncoding("UTF-8") must be called prior to any request.getParameter() call.

Jens
  • 67,715
  • 15
  • 98
  • 113
sushil bharwani
  • 29,685
  • 30
  • 94
  • 128
  • An for Tomcat (at least) that includes any calls to `getParameter()` made in any filters, or any valves. (So don't use RequestDumperValve!) – Stephen C Jul 19 '10 at 07:01
  • I am setting character encoding first thing. As answered below, works fine with POST method, but does not work with GET method – Erik Sapir Jul 19 '10 at 07:59
1

Just to comfirm that for POST parameters you have to call request.setCharacterEncoding(...) before get parameters. And for GET parameters, it is depended on what web container you are using (use Maurice Perry's answer for Tomcat).

Please check this link for more info. "Character Conversions from Browser to Database" http://java.sun.com/developer/technicalArticles/Intl/HTTPCharset/

Buhake Sindi
  • 87,898
  • 29
  • 167
  • 228
Virasak
  • 311
  • 1
  • 5
  • Updated link to the very useful article mentioned here: http://www.oracle.com/technetwork/articles/javase/httpcharset-142283.html – fausto Oct 03 '12 at 11:53
0

(as for the very first question..)
if you read parameters from the body it is also possible to read each item with its own encoding (look in the last line):

ServletFileUpload upload = new ServletFileUpload(new DiskFileItemFactory());
List items = null;
try {
    items = upload.parseRequest(request);
} catch (FileUploadException ex) {
    logger.warn("Fail during file upload");
    return uploads;
}

Iterator itr = items.iterator();
while (itr.hasNext()) {
    FileItem item = (FileItem) itr.next();
    if (item.isFormField()) {
        String name = item.getFieldName();
        System.out.println("name: " + name);
        String value = item.getString();
        System.out.println("get as utf8 - "+item.getString("UTF-8"));
Matthias
  • 7,432
  • 6
  • 55
  • 88
ozma
  • 1,633
  • 1
  • 20
  • 28
0

for jboss/wildfly there is a feature request https://issues.jboss.org/browse/WFLY-2533

Drop this into WEB-INF/jboss-web.xml:

<?xml version="1.0" encoding="UTF-8"?>
<jboss-web version="8.0" xmlns="http://www.jboss.com/xml/ns/javaee" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://www.jboss.org/j2ee/schema/jboss-web_8_0.xsd">
    <!-- browser tend to not send encoding information, so we have to match the servlet container's
    default encoding with our requested form data encoding: -->
    <default-encoding>UTF-8</default-encoding>
</jboss-web>
user1050755
  • 11,218
  • 4
  • 45
  • 56