3

How to make the servlet accept non-ascii (Arabian, chines, etc) characters passed from JSPs?

I've tried to add the following to top of JSPs:

<%@page language="java" contentType="text/html; charset=UTF-8" pageEncoding="UTF-8"%>

And to add the following in each post/get method in the servlet:

request.setCharacterEncoding("UTF-8");
response.setCharacterEncoding("UTF-8");

I've tried to add a Filter that executes the above two statements instead of in the servlet.

To be quite honest, these was working in the past, but now it doesn't work anymore.

I am using tomcat 5.0.28/6.x.x on JDK1.6 on both Win & Linux boxes.

Here's an example: JSP Page:

<%@page language="java" contentType="text/html; charset=UTF-8" pageEncoding="UTF-8"%>
<html>
<head>
<title>Push Engine</title>
</head>
<body>
Hello ${requestScope['val']}
<form action="ControllerServlet" method="POST">
<table>
    <tr>
        <td>ABC</td>
        <td><input name="ABC" type="text" /></td>
    </tr>
    <tr>
        <td></td>
        <td><input type="submit" value="Submit"></td>
    </tr>
</table>
</form>

</body>
</html>

Servlet doGet method:

protected void doPost(HttpServletRequest request, HttpServletResponse response) 
            throws ServletException, IOException {
        request.setCharacterEncoding("UTF-8");
        String val = "request.getParameter('ABC') : " + request.getParameter("ABC");
        System.out.println(val);
        request.setAttribute("val", val);
        request.getRequestDispatcher("index.jsp").forward(request, response);
    }

THE PROBLEM IS: in the console, value "???" is being printed, however, the value returned backed to the JSP page containing the correct Unicode word

the "???" printed to the console is a problem in the machine that I ran this test on. I've ran the same example on another machine, and It works properly!

Muhammad Hewedy
  • 29,102
  • 44
  • 127
  • 219

3 Answers3

8

To the point, you need to set the request encoding.

For GET requests (wherein the parameters are passed through the request URL), you need to configure this at appserver level. In for example Tomcat 6.0 it suffices to set the URIEncoding attribute of the <Connector> element in /conf/server.xml to UTF-8.

<Connector (...) URIEncoding="UTF-8" />

For POST requests (wherein the parameters are "invisibly" passed through the request body), you need to call ServletRequest#setCharacterEncoding() with UTF-8 before gathering any request parameter. The best place is to do this is in a filter which is been called as the very first filter in the chain:

if (request.getCharacterEncoding() == null) {
    request.setCharacterEncoding("UTF-8");
}
chain.doFilter(request, response);
BalusC
  • 1,082,665
  • 372
  • 3,610
  • 3,555
  • 1
    Ohhh, thanks too much, It is actually what I want. My problem was because I sent Get request, not Post – Muhammad Hewedy Apr 13 '10 at 16:31
  • So, Isn't there any programmatic (opposite to configurable) way to solve this GET issue ? – Muhammad Hewedy Apr 13 '10 at 17:19
  • You could parse the [HttpServletRequest#getQueryString()](http://java.sun.com/javaee/5/docs/api/javax/servlet/http/HttpServletRequest.html#getQueryString%28%29) yourself. It's not decoded by the container. To abstract this more, you could provide a [HttpServletRequestWrapper](http://java.sun.com/javaee/5/docs/api/javax/servlet/http/HttpServletRequestWrapper.html) implementation which does exactly that on all the getParameter() methods. – BalusC Apr 13 '10 at 17:28
  • You need to configure the console to output characters as UTF-8 as well. Also see http://balusc.blogspot.com/2009/05/unicode-how-to-get-characters-right.html#DevelopmentEnvironment (read the entire article though). – BalusC Apr 13 '10 at 20:24
  • It was actually a problem of my system! Thanks – Muhammad Hewedy Apr 14 '10 at 15:02
  • @BalusC hi.. I have the same problem.. I add what you wrote about the post method but i still have problem.. here my question.. http://stackoverflow.com/questions/31448655/how-can-i-read-a-utf-8-value-in-my-servlet – yaylitzis Jul 16 '15 at 17:23
0

Setting the content type of the page is communication from your server to the browser about what the server is sending it, and that's not really going to help you much. What you need to ensure is that your client-to-server communication has the right character encoding, and that your server is running with the correct locale. The precise way you set that up depends on the framework you're using and how your server is configured; the first thing to do would be to make sure that your server is launched with the right locale in the environment (the LC_ALL variable probably).

Note that the client may try to tell your server what locale it wants, and that's something your framework would probably help you with. (It'd be a header in the HTTP request.)

Pointy
  • 405,095
  • 59
  • 585
  • 614
0
if (request.getCharacterEncoding() == null) {     request.setCharacterEncoding("UTF-8"); }

This worked for me. I have set charset=UTF-8 in JSP META tag and added above code in the servlet. After this it has saved arabic data correctly in Oracle database

wattostudios
  • 8,666
  • 13
  • 43
  • 57