3

I'm learning Java and while writing my first web app I've encountered some problems with polish letters (like ą, ć, ł, ś etc.). Problem is that when I bind object to a form in GET it shows ok in browser, with all polish letters just fine (database is configured properly), but after hitting "send" button on the page in my controller, in POST, I get garbled text with polish letters missing. When I set encoding in the view (JSP file) to utf-8 in controller instead of "ą" I get "Ä" (two bytes) and with encoding set to iso8895-2 I get "±" (1 byte). With servlets solution was to add

request.setCharacterEncoding("8859_2");

as the first line in POST, but Hibernate doesn't use HttpServletRequest, so even when I add it I still get garbage. STS (my IDE) is set to UTF-8.

Is there any solution to his?

Browser log as asked:

Request URL:http://localhost:8080/Project/register
Request Method:POST
Status Code:200 
Remote Address:[::1]:8080
Referrer Policy:no-referrer-when-downgrade
Response Headers
view source
Content-Language:pl-PL
Content-Length:3338
Content-Type:text/html;charset=UTF-8
Date:Mon, 29 Jan 2018 11:30:04 GMT
Request Headers
view source
Accept:text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8
Accept-Encoding:gzip, deflate, br
Accept-Language:en-US,en;q=0.8
Cache-Control:max-age=0
Connection:keep-alive
Content-Length:58
Content-Type:application/x-www-form-urlencoded
Cookie:JSESSIONID=88145A5FCBBD13FDBE3C288110B38187
DNT:1
Host:localhost:8080
Origin:http://localhost:8080
Referer:http://localhost:8080/Project/register
Upgrade-Insecure-Requests:1
User-Agent:Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.102 Safari/537.36 Vivaldi/1.94.971.8
Form Data
view source
view URL encoded
username:ąąą
email:
age:0
phone:0
password:

And after clicking view URL encoded:

%C4%85%C4%85%C4%85
Roman C
  • 49,761
  • 33
  • 66
  • 176
Witcher
  • 41
  • 6
  • Any browser debug to show in which encoding the form is actually be submitted as? Did you specify UTF-8 encoding in your controller POST mapping (ex: `produces = "text/plain;charset=UTF-8"`) as suggested in [this answer](https://stackoverflow.com/a/12023816/4660500)? – Simon Berthiaume Jan 28 '18 at 02:26
  • Hi. I added browser log in first post. Changing controller postmapping to @PostMapping(value = "/register", produces = "text/plain;charset=UTF-8") unfortunately didn't change anything. I'm running out of ideas... – Witcher Jan 29 '18 at 11:37

2 Answers2

2

You should set a content-type header to your page, then it's used to send in the POST request.

See Setting the HTTP charset parameter:

Documents transmitted with HTTP that are of type text, such as text/html, text/plain, etc., can send a charset parameter in the HTTP header to specify the character encoding of the document.

It is very important to always label Web documents explicitly. HTTP 1.1 says that the default charset is ISO-8859-1. But there are too many unlabeled documents in other encodings, so browsers use the reader's preferred encoding when there is no explicit charset parameter.

The line in the HTTP header typically looks like this:

Content-Type: text/html; charset=utf-8

In theory, any character encoding that has been registered with IANA can be used, but there is no browser that understands all of them. The more widely a character encoding is used, the better the chance that a browser will understand it. A Unicode encoding such as UTF-8 is a good choice for a number of reasons.


I'm not sure that specific content-type code could fix the problem in encoding while the page is transferred via HTTP. But a correct code should be set to the request.

Header set Content-Type "text/html; charset=iso-8859-2" 

UTF-8 is a character encoding common of characters beyond the ASCII character set. The servers by default are configured to use UTF-8.

Roman C
  • 49,761
  • 33
  • 66
  • 176
  • Hi. I added browser log in first post. Encoding in html file is set in like three places: in the first line: <%@ page language="java" contentType="text/html; charset=UTF-8" pageEncoding="UTF-8"%> and later in If I change it to iso-8859-2 then in controller it is maped to UTF-8 and I still get wrong result. – Witcher Jan 29 '18 at 11:42
0

Problem was solved by setting up filter and reconfiguring Tomcat a bit, described here: https://stackoverflow.com/a/40484064/8783698

Witcher
  • 41
  • 6