0

I'm currently learning JEE and within an exercise I just need to send text data from a .jsp file to another, using a basic form with a POST method. In this form, I want to be able to use accented characters, so I use <%@page pageEncoding="UTF-8" %> on top of my jsp files, they also have both the <meta charset="utf-8"> tags and my IDE (Eclipse) is configurated to encode everything in UTF-8.

The problem is that at the end of the line, when I try to display my characters using EL, the accented characters (and the other ones I guess) are encoded in ISO-8859-1.

Which is really peculiar here is that when sending data using a GET method, I don't have any problem at all. Same result when I pass a String in the request via an attribute set in a servlet.

In fact I already solved the problem by sending the request to a servlet and calling request.setCharacterEncoding("utf-8") in a doPost method (let's precise that calling request.getCharacterEncoding() before that gives me null), but I'd like to understand what exactly is happening here. I guess it comes from a server misconfiguration, but when I check the web.xml file of my server config I have these lines :

<filter>
    <filter-name>setCharacterEncodingFilter</filter-name>
    <filter-class>org.apache.catalina.filters.SetCharacterEncodingFilter</filter-class>
    <init-param>
        <param-name>encoding</param-name>
        <param-value>UTF-8</param-value>
    </init-param>
    <async-supported>true</async-supported>
</filter>

My confusion comes from the fact that nobody ever told me to use the request.setCharacterEncoding("utf-8"), and that it does not appear normal to me that I would have to do so, so I guess the question would be : do I absolutely have to use it ? Why ? Shouldn't the encoding be handled by configuration of the server ?

I'm using Tomcat 9 for the server, and I'm under Ubuntu (don't know if it helps).

Ooalkman
  • 3
  • 3
  • Possible duplicate of [UTF-8 encoding a servlet form submission with Tomcat](https://stackoverflow.com/questions/8391675/utf-8-encoding-a-servlet-form-submission-with-tomcat) – Selaron Nov 20 '18 at 15:39
  • It is indeed the same problem, but it does not really gives an answer to my question (which I have precised in editing my initial post). There are some general concepts that I actually don't understand and I would like to have a more complete answer. Maybe should I post on the thread you have linked ? – Ooalkman Nov 20 '18 at 15:56
  • No, if you still feel it's a different problem it's legit to enhance your question as detailed as possible and hope for an answer. Don't use comment function for chat or extended discussion on other questions. – Selaron Nov 20 '18 at 16:00
  • You are asking for the 'why'. Found a quite detailed explanation here: http://balusc.omnifaces.org/2009/05/unicode-how-to-get-characters-right.html – Selaron Nov 20 '18 at 19:18
  • Thank you so much, that was exactly what I was looking for ! – Ooalkman Nov 20 '18 at 21:21

1 Answers1

0

The answer is here (thanks to Selaron) : http://balusc.omnifaces.org/2009/05/unicode-how-to-get-characters-right.html

URL-decoding POST request parameters is a story apart. The webbrowser is namely supposed to send the charset used in the Content-Type request header. However, most webbrowsers doesn't do it. Those webbrowsers will just use the same character encoding as the page with the form was delivered with, i.e. it's the same charset as specified in Content-Type header of the HTTP response or the tag.

Basically the problem comes from the navigator, which should pass the encoding charset it used within the header request but does not do it. Since Tomcat is not given any charset to decode the request, it decides on its own to do it ISO-8859-1 style by default. And apparently, you can't configure that ! So you just have to force it by indicating the encoding charset was UTF-8.

I guess the guy who wrote the lectures I'm following had a better navigator, because he never mentioned that problem. Anyway, now I feel a lot better ! Thanks a lot !

Ooalkman
  • 3
  • 3