8

I am currently using the jQuery-File-Upload. I may upload some files with a Japanese or Chinese file name, and I can see that the file name is for example, "お疲れ様です.txt" or "测试文档.txt" in browser's debug mode, but in the backend(Java), they become "ã�Šç–²ã‚Œæ§˜ã�§ã�™.txt" and "测试文档.txt".
I once tried to set formAcceptCharset to UTF-8 but it does not work.
Question:
How to get the correct file name in Java side when parsing the MultipartFormData?

Thanks in advance.

BTW, The following is my data

-----------------------------25382434931419
Content-Disposition: form-data; name="file"; filename="�疲れ様��.txt"
Content-Type: text/plain
....

Add the Java codes
In fact I did nothing in Java side currently,

@POST
@Consumes(MediaType.MULTIPART_FORM_DATA)
public String upload(InMultiPart inMP) {
    while (inMP.hasNext()) {
        InPart part = inMP.next();
        MultivaluedMap<String, String> headers = part.getHeaders();
        String fileName = null;
        if (!headers.containsKey("Content-Disposition")) {
            continue;
        } else {
            // get the file name here
            fileName = parseFileName(headers.getFirst("Content-Disposition"));
        }
        //.....
    }
    //......
}

private String parseFileName(String disposition) {
    int fileNameIndex = disposition.indexOf("filename=");
    if (fileNameIndex < 0) {
        return null;
    }
    int start = disposition.indexOf("\"", fileNameIndex) + 1;
    int end = disposition.indexOf("\"", start);
    return  disposition.substring(start, end);
}
zelibobla
  • 1,498
  • 1
  • 16
  • 23
Edward
  • 939
  • 3
  • 10
  • 17
  • 3
    Not sure, but aren't Japanese/Chinese characters `UTF-16` encoded? – Rob Mar 12 '13 at 08:19
  • 1
    @Rob All unicode encodings (`UTF`s) can by definition encode all unicode characters. They are just optimized for different cases. – Esailija Mar 12 '13 at 08:22
  • @Esailija Thanks for your help, I have showed the Java code. – Edward Mar 12 '13 at 10:01
  • 1
    Well in your code there is no decoding happening, it's already a string. You need to go to the point where you still have the raw bytes and use the correct encoding to turn them into a string. – Esailija Mar 14 '13 at 09:19
  • @Esailija Thanks for your reply. I know that I did nothing on decoding, however, this String is already there when I received it, I can hardly find the _point_ where the raw bytes are. I did nothing on front side...Have you once used that plugin? I can find a related answer on its [wiki](https://github.com/blueimp/jQuery-File-Upload/wiki/Frequently-Asked-Questions). According to the question **Is there a problem uploading files with non-ASCII characters**, we need to do something on server side, but I am now totally confused. – Edward Mar 14 '13 at 13:57
  • 2
    @Edward normally with servlets it's done by `request.setCharacterEncoding("utf-8"); response.setCharacterEncoding("utf-8");` before reading anything from the request. I dunno how to do it in this framework you are using. – Esailija Mar 14 '13 at 14:19
  • @Esailija I wrote a Filter to set the characterEncoding to UTF-8 for requests and add configuration in web.xml, however, nothing changed... BTW, what I am using is [Apache Wink](http://wink.apache.org/), whose server module is a implementation of the JAX-RS v1.1 specification – Edward Mar 15 '13 at 05:57
  • 3
    Did you tried this? http://stackoverflow.com/questions/5325322/java-servlet-download-filename-special-characters/13359949#13359949 – Hemang Apr 03 '13 at 12:06
  • It might also be useful to say what your webcontainer / application server is. For instance, with some versions of Tomcat you may need to put some stuff into the filter chain to get it to decode UTF-8 request parameters properly. – Stephen C Apr 28 '13 at 11:31
  • What's the result of Charset.defaultCharset on your server? Some libraries are calling String<->byte[] convertion without specifying charset, which causes the default charset to be used (real nightmare for finding bugs). – Danubian Sailor May 27 '13 at 14:41
  • Thans for all your comment, and sorry that it has been a long time since I left this question. I finally solved this problem but the solution is a little... it's just because that I forgot to set my charset of the workspace to UTF-8.... – Edward Jun 08 '13 at 14:56

1 Answers1

1

As Stephen C said a filter can be used to get the right encoding. We had this problem on JBOSS 7.1.1 and implemented a filter.

In web xml

<filter>
    <display-name>set character encoding</display-name>
    <filter-name>RequestEncodingFilter</filter-name>
    <filter-class>com.myapp.RequestEncodingFilter</filter-class>
    <init-param>
        <param-name>encoding</param-name>
        <param-value>UTF-8</param-value>
    </init-param>
</filter>
<filter-mapping>
    <filter-name>RequestEncodingFilter</filter-name>
    <url-pattern>/*</url-pattern>
</filter-mapping>

Filter class:

public class RequestEncodingFilter implements Filter {

private static final String ENCODING = "encoding";
private String configuredEncoding;

@Override
public void init(FilterConfig filterConfig) throws ServletException {
    configuredEncoding = filterConfig.getInitParameter(ENCODING);
}

@Override
public void doFilter(ServletRequest servletRequest, ServletResponse servletResponse, FilterChain filterChain) throws IOException, ServletException {
    servletRequest.setCharacterEncoding(configuredEncoding);
    filterChain.doFilter(servletRequest, servletResponse);
}

@Override
public void destroy() {
}

}

Danubian Sailor
  • 1
  • 38
  • 145
  • 223
Dan
  • 407
  • 4
  • 16