6

I am working on a Java application. I need to get UTF-8 encoding in my Java webapp to support Bengali (বাংলা) text. I have done the following:

Tomcat's server.xml

<Connector port="8080"
    protocol="HTTP/1.1"
    connectionTimeout="20000"
    redirectPort="8443"
    URIEncoding="UTF-8" />

<Connector executor="tomcatThreadPool"
    port="8080"
    protocol="HTTP/1.1"
    connectionTimeout="20000"
    redirectPort="8443"
    URIEncoding="UTF-8" />

<Connector protocol="AJP/1.3"
    address="::1"
    port="8009"
    redirectPort="8443"
    URIEncoding="UTF-8" />

JVM defaultCharset in catalina.bat file

set JAVA_OPTS=%JAVA_OPTS% -Dfile.encoding=UTF-8

properties in application.properties

spring.datasource.url=jdbc:mysql://localhost:3306/database_name?useUnicode=true\&characterEncoding=UTF-8
spring.datasource.tomcat.connection-properties=useUnicode=true;characterEncoding=UTF-8

spring.http.encoding.charset=UTF-8
spring.http.encoding.enabled=true
spring.http.encoding.force=true

server.tomcat.uri-encoding=UTF-8
spring.webflux.multipart.headers-charset=UTF-8
spring.thymeleaf.encoding=UTF-8

meta tag in html file

<!doctype html>
<html lang="en" xmlns:th="http://www.thymeleaf.org" xmlns:sec="http://www.thymeleaf.org/extras/spring-security">
    <head>
        <meta charset="utf-8">
    </head>

    <body>
    </body>
</html>

utf-8 support in form tag

<form enctype="multipart/form-data" accept-charset="UTF-8" action="#" th:action="@{/create}" th:object="${object}" th:method="POST">
    <div class="form-group">
        <label for="name" class="col-form-label">Name</label>
        <input type="text" class="form-control" id="name" name="name" th:field="*{name}" placeholder="Enter Name">
    </div>
<div class="form-group">
    <label for="photo">Photo</label>
    <input type="file" class="form-control-file" id="photo" name="photo"/>
</div>
    <div>
        <button class="btn" type="submit">Submit</button>
    </div>
</form>

MySQL configuration (my.ini)

[client]
default-character-set = utf8mb4

[mysql]
default-character-set = utf8mb4

MySQL properties:

Database:
Default collation: utf8mb4_0900_ai_ci
Default charactterset: utf8mb4

Table:
Table collation: utf8mb4_0900_ai_ci

Column:
Type: varchar(255)
Character Set: utf8mb4
Collation: utf8mb4_0900_ai_ci

Configuration:

  • Java 11.0.2
  • Tomcat 8.5
  • MySQL 8.0.16
  • Spring Boot 2.2.4
  • Maven 3.8.1
  • Windows Server 2019 Standard (Production) + Windows 10 Home (Development)

When I submit a form with value আনোয়ার, it is saved as আনোয়ার

How can I solve this problem?

When I run the application from eclipse it works fine. But when the war file is deployed in Tomcat server it does not work.

I tried the following code. It prints আনোয়ার in tomcat8-stdout file. So I think problem occurs while transferring data from browser to server, from server to database is fine.

@PostMapping("/create")
public String create(@ModelAttribute("object") Object object, @RequestParam("photo") MultipartFile photo) throws IOException {
    System.out.println(object.getName());
    return "redirect:/index";
}
Mark Rotteveel
  • 100,966
  • 191
  • 140
  • 197
Partho63
  • 3,117
  • 2
  • 21
  • 39

3 Answers3

6

As you are using Spring, you can try using CharacterEncondingFilter to enforce UTF-8 encoding.

You can find multiple examples on how to do that. Consider for instance this or this other.

Basically, you need to register the filter somewhere in your Java configuration. Taking one of the indicated examples:

@Bean
@Order(Ordered.HIGHEST_PRECEDENCE)
public FilterRegistrationBean<CharacterEncodingFilter> characterEncodingFilterRegistration() {
  CharacterEncodingFilter filter = new CharacterEncodingFilter();
  filter.setEncoding("UTF-8"); // use your preferred encoding
  filter.setForceEncoding(true); // force the encoding

  FilterRegistrationBean<CharacterEncodingFilter> registrationBean =
    new FilterRegistrationBean<>(filter); // register the filter
  registrationBean.addUrlPatterns("/*"); // set preferred url
  return registrationBean;
}

In fact, this registration process should be performed automatically by the Spring Boot HttpEncodingAutoConfiguration. Please, note the requirements for that:

@Configuration(proxyBeanMethods=false)
@EnableConfigurationProperties(value=ServerProperties.class)
@ConditionalOnWebApplication(type=SERVLET)
@ConditionalOnClass(value=org.springframework.web.filter.CharacterEncodingFilter.class)
@ConditionalOnProperty(prefix="server.servlet.encoding",
                       value="enabled",
                       matchIfMissing=true)

As you can see, the registration of the filter is related to properties with the server.servlet.encoding prefix.

Therefore, as an alternative, to properly configure the charset filter, another thing you could try is to configure your application using the server.servlet.encoding.* related properties:

server.servlet.encoding.charset=UTF-8
server.servlet.encoding.force=true

instead of the old version of that properties, based on the spring.http.encoding prefix, you are currently using in your application.properties configuration.

The problem could be related to the processing of multipart requests. Although it is advisable to use the default mechanisms exposed by Spring Boot and the underlying containers, one thing you could try is to use commons-multipart to handle the file upload, and configure the library to process the headers and form fields as UTF-8. The process can be achieved as follows.

First, include the commons-fileupload dependency in your pom.xml if using Maven or equivalently if using Gradle:

<dependency>
    <groupId>commons-fileupload</groupId>
    <artifactId>commons-fileupload</artifactId>
    <version>1.4</version>
</dependency>

Then, in any place in your Java configuration, include the following bean:

@Bean(name = "multipartResolver")
public CommonsMultipartResolver multipartResolver() {
    CommonsMultipartResolver multipartResolver = new CommonsMultipartResolver();
    // Note how we set the encoding
    multipartResolver.setDefaultEncoding("UTF-8");
    return multipartResolver;
}

As you can see, we are configuring the default encoding property to the appropriate values:

Set the default character encoding to use for parsing requests, to be applied to headers of individual parts and to form fields. Default is ISO-8859-1, according to the Servlet spec.

CommonsMultipartResolver provides different methods you can use to customize the upload behavior as necessary.

In addition to these tips, and the suggestion of @Olivier in his answer, at first glance it looks like you configured everything correctly. In any case, please, consider read for instance this related SO question, although for PHP, it can provide a valuable information.

As for your comments it seems that the information is correctly transmitted between your server and database, try debugging the communication from your HTML pages and the server.

A valuable tool for that could be the browser inspector Network tab: see what is submitted from your page to your server, any browser almost certainly will provide you the actual information interchanged "as is", in the actual encoding in which it is sent.

Another valuable tool for the same purpose could be a network traffic analyzer like Wireshark or Fiddler.

Unless you have the ability to remote debugging your code and see the variables value, please, do not rely in the output provided by System.out: when you see in a file, there are such a great number of factors in place, that it will almost certainly give you a wrong information.

Looking for information regarding this issue I came across this excellent article. Especially, it provides an example of analyzing the different code points that compose your String: instead of directly outputting the information to System.out, this kind of analysis can provide a valuable information.

jccampanero
  • 50,989
  • 3
  • 20
  • 49
  • I have changed my properties. It didn't work. Do I need to do both register the filter and change the application properties? – Partho63 Jan 09 '22 at 06:59
  • I think I have made a mess trying out all the tutorials/examples out there. Can you please give me a link where all the minimum necessary steps are documented in order? So that I can start from scratch and follow the link. – Partho63 Jan 09 '22 at 07:01
  • Thank you for the feedback @Partho63. Well, in fact, I think you have already configured everything properly. I will try to find any useful additional links or to clarify the process in the answer in any way. In any case, I updated the answer with a possible alternative based on the configuration of `CommonsMultipartResolver`. Please, could you try the suggested change? I hope it helps. – jccampanero Jan 09 '22 at 12:00
  • @Partho63 Were you able to try the `CommonsMultipartResolver` suggested approach? I updated the answer with further tips, although as I told you before in my previous comments, your setup looks fine to me. I hope any of the suggested information helps you to solve the problem. – jccampanero Jan 10 '22 at 22:32
  • I didn't tried `CommonsMultipartResolver`. Instead I submitted another simple form that does not have any image/file input. That form acts the same way (doesn't support utf-8). So I think the problem is not related to MultipartResolver. – Partho63 Jan 12 '22 at 10:34
  • Thank you for all the tips and resources. I will read those for sure. If I find any solution I will let you know. – Partho63 Jan 12 '22 at 10:38
  • You are welcome @Partho63. I hope any of the suggested approaches work for you. – jccampanero Jan 12 '22 at 19:05
3

The encoding of POST parameters is not set at the Connector level, but on the ServletRequest object.

Tomcat provides a filter to set it, as explained in the documentation.

Add this to your web.xml file:

<filter>
  <filter-name>setCharacterEncodingFilter</filter-name>
  <filter-class>org.apache.catalina.filters.SetCharacterEncodingFilter</filter-class>
  <init-param>
    <param-name>encoding</param-name>
    <param-value>UTF-8</param-value>
  </init-param>
</filter>

<filter-mapping>
  <filter-name>setCharacterEncodingFilter</filter-name>
  <url-pattern>/*</url-pattern>
</filter-mapping>
Olivier
  • 13,283
  • 1
  • 8
  • 24
0

Did you try to set the below in the server.xml for tomcat to allow escaped chars in the URL's

<Connector port="8080" protocol="HTTP/1.1"
connectionTimeout="20000" relaxedPathChars="[]|" relaxedQueryChars="&#x5B;&#x5D;&#x7C;&#x7B;&#x7D;&#x5E;&#x5C;&#x60;&#x22;&#x3C;&#x3E;" redirectPort="8443" />
danlaffan
  • 11
  • 1