8

I have such a link in JSP page with encoding big5 http://hello/world?name=婀ㄉ And when I input it in browser's URL bar, it will be changed to something like http://hello/world?name=%23%24%23 And when we want to get this parameter in jsp page, all the characters are corrupted.

And we have set this: request.setCharacterEncoding("UTF-8"), so all the requests will be converted to UTF8.

But why in this case, it doesn't work ? Thanks in advance!.

MemoryLeak
  • 7,322
  • 23
  • 90
  • 133

5 Answers5

14

When you enter the URL in browser's address bar, browser may convert the character encoding before URL-encoding. However, this behavior is not well defined, see my question,

Handling Character Encoding in URI on Tomcat

We mostly get UTF-8 and Latin-1 on newer browsers but we get all kinds of encodings (including Big5) in old ones. So it's best to avoid non-ASCII characters in URL entered by user directly.

If the URL is embedded in JSP, you can force it into UTF-8 by generating it like this,

String link = "http://hello/world?name=" + URLEncoder.encode(name, "UTF-8");

On Tomcat, the encoding needs to be specified on Connector like this,

<Connector port="8080" URIEncoding="UTF-8"/>

You also need to use request.setCharacterEncoding("UTF-8") for body encoding but it's not safe to set this in servlet because this only works when the parameter is not processed but other filter or valve may trigger the processing. So you should do it in a filter. Tomcat comes with such a filter in the source distribution.

Community
  • 1
  • 1
ZZ Coder
  • 74,484
  • 29
  • 137
  • 169
8

To avoid fiddling with the server.xml use :

protected static final String CHARSET_FOR_URL_ENCODING = "UTF-8";

protected String encodeString(String baseLink, String parameter)
        throws UnsupportedEncodingException {
    return String.format(baseLink + "%s",
            URLEncoder.encode(parameter, CHARSET_FOR_URL_ENCODING));
}
// Used in the servlet code to generate GET requests
response.sendRedirect(encodeString("userlist?name=", name));

To actually get those parameters on Tomcat you need to do something like :

final String name =
        new String(request.getParameter("name").getBytes("iso-8859-1"), "UTF-8");

As apparently (?) request.getParameter URLDecodes() the string and interprets it as iso-8859-1 - or whatever the URIEncoding is set to in the server.xml. For an example of how to get the URIEncoding charset from the server.xml for Tomcat 7 see here

Community
  • 1
  • 1
Mr_and_Mrs_D
  • 32,208
  • 39
  • 178
  • 361
6

You cannot have non-ASCII characters in an URL - you always need to percent-encode them. When doing so, browsers have difficulties rendering them. Rendering works best if you encode the URL in UTF-8, and then percent-encode it. For your specific URL, this would give http://hello/world?name=%E5%A9%80%E3%84%89 (check your browser what it gives for this specific link). When you get the parameter in JSP, you need to explicitly unquote it, and then decode it from UTF-8, as the browser will send it as-is.

Martin v. Löwis
  • 124,830
  • 17
  • 198
  • 235
  • 1
    But how can i encode it ?change it from Non-ASCII to Percent-encode? What function should i use in java ? – MemoryLeak Sep 02 '09 at 07:13
0

I had a problem with JBoss 7.0, and I think this filter solution also works with Tomcat:

public void doFilter(ServletRequest request, ServletResponse response, FilterChain chain) throws IOException, ServletException {

    HttpServletRequest httpRequest = (HttpServletRequest) request;
    HttpServletResponse httpResponse = (HttpServletResponse) response;

    try {
        httpRequest.setCharacterEncoding(MyAppConfig.getAppSetting("System.Character.Encoding"));

        String appServer = MyAppConfig.getAppSetting("System.AppServer");
        if(appServer.equalsIgnoreCase("JBOSS7")) {
            Field requestField = httpRequest.getClass().getDeclaredField("request");
            requestField.setAccessible(true);
            Object requestValue = requestField.get(httpRequest);

            Field coyoteRequestField = requestValue.getClass().getDeclaredField("coyoteRequest");
            coyoteRequestField.setAccessible(true);
            Object coyoteRequestValue = coyoteRequestField.get(requestValue);

            Method getParameters = coyoteRequestValue.getClass().getMethod("getParameters");
            Object parameters = getParameters.invoke(coyoteRequestValue);

            Method setQueryStringEncoding = parameters.getClass().getMethod("setQueryStringEncoding", String.class);
            setQueryStringEncoding.invoke(parameters, MyAppConfig.getAppSetting("System.Character.Encoding"));

            Method setEncoding = parameters.getClass().getMethod("setEncoding", String.class);
            setEncoding.invoke(parameters, MyAppConfig.getAppSetting("System.Character.Encoding"));
        }

    } catch (NoSuchMethodException nsme) {
        System.err.println(nsme.getLocalizedMessage());
        nsme.printStackTrace();
        MyLogger.logException(nsme);
    } catch (InvocationTargetException ite) {
        System.err.println(ite.getLocalizedMessage());
        ite.printStackTrace();
        MyLogger.logException(ite);
    } catch (IllegalAccessException iae) {
        System.err.println(iae.getLocalizedMessage());
        iae.printStackTrace();
        MyLogger.logException(iae);

    } catch(Exception e) {
        TALogger.logException(e);
    }

    try {
        httpResponse.setCharacterEncoding(MyAppConfig.getAppSetting("System.Character.Encoding"));
    } catch(Exception e) {
        MyLogger.logException(e);
    }
}
Joel
  • 4,732
  • 9
  • 39
  • 54
ff9will
  • 1
  • 1
0

I did quite a bit of searching on this issue so this might help others who are experiencing the same problem on tomcat. This is taken from http://wiki.apache.org/tomcat/FAQ/CharacterEncoding.

(How to use UTF-8 everywhere).

  • Set URIEncoding="UTF-8" on your <Connector> in server.xml. References: HTTP Connector, AJP Connector.
  • Use a character encoding filter with the default encoding set to UTF-8
  • Change all your JSPs to include charset name in their contentType. For example, use <%@page contentType="text/html; charset=UTF-8" %> for the usual JSP pages and <jsp:directive.page contentType="text/html; charset=UTF-8" /> for the pages in XML syntax (aka JSP Documents).
  • Change all your servlets to set the content type for responses and to include charset name in the content type to be UTF-8. Use response.setContentType("text/html; charset=UTF-8") or response.setCharacterEncoding("UTF-8").
  • Change any content-generation libraries you use (Velocity, Freemarker, etc.) to use UTF-8 and to specify UTF-8 in the content type of the responses that they generate.
  • Disable any valves or filters that may read request parameters before your character encoding filter or jsp page has a chance to set the encoding to UTF-8.
Mr Lister
  • 45,515
  • 15
  • 108
  • 150
Tuan
  • 1,476
  • 13
  • 23