0

Before you mark my question as duplicate with this question, I want to say that I read it and also this post.

However, I do sth wrong and I continue to read in a wrong format the data from the form in my jsp (with POST method). What I have done:

1. In my JSP, I have put this

<%@page contentType="text/html" pageEncoding="UTF-8" language="java" %>

and in the header this <meta charset="UTF-8">

2. In the servlet

protected void processRequest(HttpServletRequest request, HttpServletResponse response)
            throws ServletException, IOException {

        //...        
        //code
        //...

        request.setCharacterEncoding("UTF-8");

        /*if (request.getCharacterEncoding() == null) {
                request.setCharacterEncoding("UTF-8");
         }
       */
       //...
       //code
       //...

       s1 = request.getParameter(kname1); //<-here I read the value from the JSP and get finally this ÎÏδÏÎ±Î´Î±Ï 

}

3. In the web.xml I have <?xml version="1.0" encoding="UTF-8"?>

What I missed here??

Community
  • 1
  • 1
yaylitzis
  • 5,354
  • 17
  • 62
  • 107
  • try this : `s1 = new String(request.getParameter(kname1).getBytes(),"utf-8");` – nafas Jul 17 '15 at 15:55
  • also print this : `request.getParameter(kname1).getBytes();` , then you may figure out the problem from the bytes. (remove `if (request.getCharacterEncoding() == null) { request.setCharacterEncoding("UTF-8"); }` – nafas Jul 17 '15 at 15:57
  • Are you posting a form? Is `kname1` expected to be in the body or in the query string? – Sotirios Delimanolis Jul 17 '15 at 15:57
  • @SotiriosDelimanolis `kname1` is a String variable.. A Greek word I am trying to read... @nafas I had a little improvement. From this ÎÏδÏÎ±Î´Î±Ï now i get α�?δα�?δ . In the log I got [B@24803429 – yaylitzis Jul 17 '15 at 16:02
  • I'm asking where does it come from? – Sotirios Delimanolis Jul 17 '15 at 16:05
  • I have a form in my JSP where I have a text input `` – yaylitzis Jul 17 '15 at 16:09
  • Is it a get or a post? – Sotirios Delimanolis Jul 17 '15 at 16:17
  • It's a `POST` method. I mention it also in question – yaylitzis Jul 17 '15 at 16:43
  • You did nowhere confirm that you've set the request encoding **before** the request body is parsed. Depending on how you observed the result (e.g. `System.out.println(s1)`), you did also nowhere exclude its encoding from being the actual culprit. Both links you found have it covered. Please confirm. – BalusC Jul 17 '15 at 19:51
  • And, please ignore nonsense posted by nafas. It would only lead you in completely wrong direction. – BalusC Jul 17 '15 at 19:57
  • 1. Can you verify that the browser got a page in UTF-8 (e.g. in Firefox, right mouse click, View Page Info -> Text Encoding) 2. Can you verify that - when submitting the form - the browser sends a HTTP header which specifies UTF-8 as character encoding (e.g. using the network-tab in Firebug) – wero Jul 17 '15 at 22:00
  • @BalusC Indeed I havent anywhere set the request encoding... I have only done those that I have written in the question body..In your anwser you say that for `POST` requests make a filter... Because I have never make a used a filter, can you guide me? – yaylitzis Jul 21 '15 at 16:00
  • So the current question is irrelevant and you're basically asking "How do I create a servlet filter?" – BalusC Jul 21 '15 at 16:19
  • well the anwser to this question is your answer here http://stackoverflow.com/questions/2630748/how-to-enable-reading-non-ascii-characters-in-servlets. Now, how I create a servlet filter, I googled it and I am reading it.. – yaylitzis Jul 21 '15 at 16:22
  • @BalusC you were right. I created a filter and it Worked. Thx! – yaylitzis Jul 23 '15 at 08:50

1 Answers1

0

Is the HTML right?

You seem to be using HTML5

<%@page contentType="text/html" pageEncoding="UTF-8" language="java"
%><!DOCTYPE html>
<html>
  <head>
    <meta charset="UTF-8">
    <title>...</title>
  </head>
  <body>
    <form accept-charset="UTF-8" ...>
    ...
  </body>
</html>

The indication in the form tag allows sending UTF-8 input unescaped to the server. Otherwise you might get &#8085; or such. (This seems not to be the case.)

You might check some things, like

<input name="test" type="hidden" value="\u0109ĉ">

Should give "ĉĉ".

There is a difference between forms with method POST and GET, check them both now before fixing the encoding functionality in stone.

In the server the request encoding?

Check that request.getCharacterEncoding() == null. Otherwise some filter might interfere. The setting of encoding can be only done initially. Maybe the servlet was forwarded to, or whatever.

Conversion might happen anywhere, especially when looking at the text. So dump the request parameter as purely as possible:

String s = request.getParameter("test");
for (char ch : s.toCharArray()) {
    printf("\\u%04x ", 0xFFFF & (int)ch);
}
println();
Joop Eggen
  • 107,315
  • 7
  • 83
  • 138
  • I created a filter and it worked. – yaylitzis Jul 23 '15 at 08:51
  • Meta charset and form accept-charset are pointless in HTML pages served over HTTP. As OP is using JSP/Servlets, it's most likely that OP is using HTTP to serve HTML pages. The average sane HTTP client will then use the charset specified in HTTP response header and ignore the ones specified in HTML. As OP (correctly) used `pageEncoding="UTF-8"` in JSP, it's definitely present in the HTTP response header. Moreover, even if this was absent, form accept-charset is only considered by MSIE browser and even then it is doing it wrong (it incorrectly substitutes ISO-8859-1 as CP-1252). Never use it. – BalusC Jul 23 '15 at 08:58
  • So, a downvote for spreading this misinformation. – BalusC Jul 23 '15 at 09:03
  • @BalusC yes, HTTP headers override meta information in HTML. Still they make sense, for instance when saving the HTML on disk. ISO-8859-1 (Latin-1) is almost in every browser understood as CP-1252 (Windows Latin-1) even on Linux and Mac. HTML 5 even prescribes this interpretation. Though a good reason to downvote, is that this seems to be a duplicate question. And this is not a sufficient answer. – Joop Eggen Jul 23 '15 at 09:13