60

I'm uploading a file to the server. The file upload HTML form has 2 fields:

  1. File name - A HTML text box where the user can give a name in any language.
  2. File upload - A HTMl 'file' where user can specify a file from disk to upload.

When the form is submitted, the file contents are received properly. However, when the file name (point 1 above) is read, it is garbled. ASCII characters are displayed properly. When the name is given in some other language (German, French etc.), there are problems.

In the servlet method, the request's character encoding is set to UTF-8. I even tried doing a filter as mentioned - How can I make this code to submit a UTF-8 form textarea with jQuery/Ajax work? - but it doesn't seem to work. Only the filename seems to be garbled.

The MySQL table where the file name goes supports UTF-8. I gave random non-English characters & they are stored/displayed properly.

Using Fiddler, I monitored the request & all the POST data is passed correctly. I'm trying to identify how/where the data could get garbled. Any help will be greatly appreciated.

Community
  • 1
  • 1
  • I benefited from http://stackoverflow.com/questions/2422468/how-to-upload-files-to-server-using-jsp-servlet/2424824#2424824 -- to be specific, it was the `@MultipartConfig` solution that worked for me (I do need to `new String(....getBytes(...), ...)` in addition to that). The other solutions listed here so far unfortunately did not work for me alone :/ – Vin Dec 08 '14 at 17:21

14 Answers14

58

I had the same problem using Apache commons-fileupload. I did not find out what causes the problems especially because I have the UTF-8 encoding in the following places: 1. HTML meta tag 2. Form accept-charset attribute 3. Tomcat filter on every request that sets the "UTF-8" encoding

-> My solution was to especially convert Strings from ISO-8859-1 (or whatever is the default encoding of your platform) to UTF-8:

new String (s.getBytes ("iso-8859-1"), "UTF-8");

hope that helps

Edit: starting with Java 7 you can also use the following:

new String (s.getBytes (StandardCharsets.ISO_8859_1), StandardCharsets.UTF_8);
Philip Helger
  • 1,814
  • 18
  • 28
  • Michael, I referred the source code. MySQL's character encoding is set to UTF-8 & pageEncoding attribute is already in the JSP. Moreover, as per Paul's accept-charset attribute is also *not* set in the form tag. But somehow the browser doesn't send UTF-8 data. Phax's soln worked out. –  Feb 16 '09 at 07:09
  • What could happen if commons-fileupload fixed this and the request is in UTF-8? Perhaps when you execute s.getBytes ("iso-8859-1") the bytes are not in the iso-8859-1 encoding. – David García González Nov 09 '09 at 13:13
  • 1
    Not sure if this will help, but commons-fileupload (at least v1.2.1) has logic to default to the platform encoding if you don't configure another value. Take a look at `org.apache.commons.fileupload.FileUploadBase` and the `headerEncoding` field. – matt b May 10 '12 at 18:00
  • When converting Russian characters, this worked as a charm. Just great! Thanks. – Khasan 24-7 Dec 03 '15 at 10:11
  • 3
    You can also use `new String(s.getBytes(Charset.defaultCharset()), "UTF-8")` – Lucas Basquerotto Apr 13 '18 at 14:33
  • 1
    Philip, StandardCharsets was introduced in Java 7, not 8 ;-) – winne2 Nov 26 '19 at 09:33
  • @DavidGarcíaGonzález it seems it's fair to anticipate that it is _always_ iso-8859-1 according to the [RFC 6266](https://tools.ietf.org/html/rfc6266#section-4.3) but that also points at a 'proper' solution which would be if the client uses `filename*` (note the `*`) instead of `filename`, and specifies the charset as utf-8, the browser and server should get it right [RFC 5987](https://tools.ietf.org/html/rfc5987) – Rhubarb May 04 '21 at 18:27
29

Just use Apache commons upload library. Add URIEncoding="UTF-8" to Tomcat's connector, and use FileItem.getString("UTF-8") instead of FileItem.getString() without charset specified.

Hope this help.

nautilusvn
  • 655
  • 2
  • 10
  • 20
  • 5
    this should be upvoted, nothing else solves the problem... even tried with filters and domain/container xml files etc.. doing getString("UTF-8") solves even if everything else is not done... – Pradyut Bhattacharya Jan 29 '14 at 21:24
  • 2
    FileItem.getString("UTF-8") was the solution for me – DLight Jun 09 '15 at 15:42
  • 1
    It is true, It works if one uses Apache commons-fileupload module: http://commons.apache.org/proper/commons-fileupload/using.html I will use it just because it solves the problem. – Mariusz Jaskółka Jun 22 '15 at 14:20
  • This is it, this is the correct answer. This helped me handle unicode characters with attachments. Thanks a lot @nautilusvn you saved my day! – Sachidananda Naik Sep 14 '21 at 19:18
21

I got stuck with this problem and found that it was the order of the call to

request.setCharacterEncoding("UTF-8");

that was causing the problem. It has to be called before any all call to request.getParameter(), so I made a special filter to use at the top of my filter chain.

https://rogerkeays.com/servletrequest-setcharactercoding-ignored

Roger Keays
  • 3,117
  • 1
  • 31
  • 23
14

I had the same problem and it turned out that in addition to specifying the encoding in the Filter

request.setCharacterEncoding("UTF-8");
response.setCharacterEncoding("UTF-8");

it is necessary to add "acceptcharset" to the form

<form method="post" enctype="multipart/form-data" acceptcharset="UTF-8" > 

and run the JVM with

-Dfile.encoding=UTF-8

The HTML meta tag is not necessary if you send it in the HTTP header using response.setCharacterEncoding().

Kevin Rahe
  • 1,609
  • 3
  • 19
  • 27
Dan
  • 2,157
  • 21
  • 15
  • 4
    I'm using Glassfish 3.1.1, and while running the JVM with `-Dfile.encoding=UTF-8` is necessary, I didn't need a filter. **However**, simply adding the `acceptcharset` attribute to the `
    ` tag didn't correct the problem. Instead, I had to add the charset identifier to the `enctype` attribute, as in: `
    `.
    – Kevin Rahe Mar 06 '13 at 16:43
  • 2
    the -Dfile.encoding=UTF-8 parameter is important. – Jasper Jul 30 '14 at 11:51
9

In case someone stumbled upon this problem when working on Grails (or pure Spring) web application, here is the post that helped me:

http://forum.spring.io/forum/spring-projects/web/2491-solved-character-encoding-and-multipart-forms

To set default encoding to UTF-8 (instead of the ISO-8859-1) for multipart requests, I added the following code in resources.groovy (Spring DSL):

multipartResolver(ContentLengthAwareCommonsMultipartResolver) {
    defaultEncoding = 'UTF-8'
}
  • 3
    At [another question in the Spring context](http://stackoverflow.com/questions/9055025/how-to-change-the-character-encoding-for-servlet-3-0-spring-mvc-multipart-upload), they mentioned that the `MultipartResolver` has a default decoding charset of ISO-8859-1. See official Spring docs here: [CommonsFileUploadSupport#setDefaultEncoding](https://docs.spring.io/spring/docs/current/javadoc-api/org/springframework/web/multipart/commons/CommonsFileUploadSupport.html#setDefaultEncoding-java.lang.String-). – easoncxz Mar 17 '16 at 01:25
  • xml format: – zoirs Jan 21 '20 at 08:34
3

I'm using org.apache.commons.fileupload.servlet.ServletFileUpload.ServletFileUpload(FileItemFactory) and defining the encoding when reading out parameter value:

List<FileItem> items = new ServletFileUpload(new DiskFileItemFactory()).parseRequest(request);

for (FileItem item : items) {
    String fieldName = item.getFieldName();

    if (item.isFormField()) {
        String fieldValue = item.getString("UTF-8"); // <-- HERE
rghome
  • 8,529
  • 8
  • 43
  • 62
2

The filter is key for IE. A few other things to check;

What is the page encoding and character set? Both should be UTF-8

<%@ page language="java" contentType="text/html; charset=UTF-8" pageEncoding="UTF-8"%>

What is the character set in the meta tag?

<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />

Does your MySQL connection string specify UTF-8? e.g.

jdbc:mysql://127.0.0.1/dbname?requireSSL=false&useUnicode=true&characterEncoding=UTF-8
Michael Glenn
  • 1,872
  • 1
  • 19
  • 23
1

I am using Primefaces with glassfish and SQL Server.

in my case i created the Webfilter, in back-end, to get every request and convert to UTF-8, like this:

package br.com.teste.filter;

import java.io.IOException;

import javax.servlet.FilterChain;
import javax.servlet.FilterConfig;
import javax.servlet.ServletException;
import javax.servlet.ServletRequest;
import javax.servlet.ServletResponse;
import javax.servlet.annotation.WebFilter;

@WebFilter(servletNames={"Faces Servlet"})
public class Filter implements javax.servlet.Filter {

    @Override
    public void destroy() {
        // TODO Auto-generated method stub

    }

    @Override
    public void doFilter(ServletRequest request, ServletResponse response,
            FilterChain chain) throws IOException, ServletException {
        request.setCharacterEncoding("UTF-8");
        chain.doFilter(request, response);      
    }

    @Override
    public void init(FilterConfig filterConfig) throws ServletException {
        // TODO Auto-generated method stub      
    }

}

In the View (.xhtml) i need to set the enctype paremeter's form to UTF-8 like @Kevin Rahe:

    <h:form id="frmt" enctype="multipart/form-data;charset=UTF-8" >
         <!-- your code here -->
    </h:form>  
Weles
  • 1,275
  • 13
  • 17
0

I had the same problem. The only solution that worked for me was adding <property = "defaultEncoding" value = "UTF-8"> to multipartResoler in spring configurations file.

aManjate
  • 1
  • 1
0

You also have to make sure that your encoding filter (org.springframework.web.filter.CharacterEncodingFilter) in your web.xml is mapped before the multipart filter (org.springframework.web.multipart.support.MultipartFilter).

Romain VDK
  • 1,798
  • 1
  • 11
  • 9
0

The filter thing and setting up Tomcat to support UTF-8 URIs is only important if you're passing the via the URL's query string, as you would with a HTTP GET. If you're using a POST, with a query string in the HTTP message's body, what's important is going to be the content-type of the request and this will be up to the browser to set the content-type to UTF-8 and send the content with that encoding.

The only way to really do this is by telling the browser that you can only accept UTF-8 by setting the Accept-Charset header on every response to "UTF-8;q=1,ISO-8859-1;q=0.6". This will put UTF-8 as the best quality and the default charset, ISO-8859-1, as acceptable, but a lower quality.

When you say the file name is garbled, is it garbled in the HttpServletRequest.getParameter's return value?

nbeyer
  • 1,157
  • 10
  • 14
0

I think i'am late for the party but when you use a wildfly, you can add an default-encoding to the standalone.xml. Just search in the standalone.xml for

<servlet-container name="default"> 

and add encoding like this:

<servlet-container name="default" default-encoding="UTF-8">
Patrick P
  • 111
  • 1
  • 5
0

To avoid converting all request parameters manually to UTF-8, you can define a method annotated with @InitBinder in your controller:

@InitBinder
protected void initBinder(WebDataBinder binder) {
    binder.registerCustomEditor(String.class, new CharacterEditor(true) {
        @Override
        public void setAsText(String text) throws IllegalArgumentException {
            String properText = new String(text.getBytes(StandardCharsets.ISO_8859_1), StandardCharsets.UTF_8);
            setValue(properText);
        }
    });
}

The above will automatically convert all request parameters to UTF-8 in the controller where it is defined.

Vlad
  • 844
  • 1
  • 12
  • 22
-1

You do not use UTF-8 to encode text data for HTML forms. The html standard defines two encodings, and the relevant part of that standard is here. The "old" encoding, than handles ascii, is application/x-www-form-urlencoded. The new one, that works properly, is multipart/form-data.

Specifically, the form declaration looks like this:

 <FORM action="http://server.com/cgi/handle"
       enctype="multipart/form-data"
       method="post">
   <P>
   What is your name? <INPUT type="text" name="submit-name"><BR>
   What files are you sending? <INPUT type="file" name="files"><BR>
   <INPUT type="submit" value="Send"> <INPUT type="reset">
 </FORM>

And I think that's all you have to worry about - the webserver should handle it. If you are writing something that directly reads the InputStream from the web client, then you will need to read RFC 2045 and RFC 2046.

paulmurray
  • 3,355
  • 1
  • 22
  • 17