2

I'm trying to upload files from an html form. I wrote a servlet for it, which should open an inputstream on the received parts and write the data into a file with the same name and extension. First, I had problems with the data itself. For example text files, which had unicode body would not encode the characters properly with UTF-8. Then I've started using DataInputStream and DataOutputStream and for some reason now that is working correctly. What remains is the problem with the filename. If and when the filename has unicode characters the filename itself won't have the right encoding, and some odd characters will appear (as expected). I've tried several things but I don't know how to fix it. I'm using Wildfly 10.0.10.Final. So, for example if my file has the name ááéé.txt, the resulting file name is ááéé.txt.

This is my HTML page:

<html>
<h:head>        
    <meta charset="UTF-8" />
    <meta content="text/html" />
</h:head>
<h:body>
    <div class="container">
        Upload a new file:
        <form enctype="multipart/form-data" method="post" action="upload">
        Files: <input multiple="multiple" id="fileUpload" type="file" name="files" />
        <input type="submit" multiple="multiple" value="upload" />
    </form>
    </div>   
</h:body>
</html>

My servlet is written as below:

@WebServlet(name = "fileUploadServlet", urlPatterns = {"/upload"})
@MultipartConfig
public class FileUploadServlet extends HttpServlet {    
    @Override
    protected void doPost(HttpServletRequest req, HttpServletResponse resp) throws ServletException, IOException {
        req.setCharacterEncoding("UTF-8");
        int n = 0;
        for (Part file : req.getParts()) {
            String fileName = new String(file.getSubmittedFileName().getBytes("UTF-8"), "UTF-8");            
            try (DataInputStream dis = new DataInputStream(file.getInputStream());
                 DataOutputStream dos = new DataOutputStream(new FileOutputStream("E:\\upload\\" + fileName))) {
                byte[] buffer = new byte[1024];
                int r;
                while ((r = dis.read(buffer)) != -1) {
                    dos.write(buffer, 0, r);
                }
                n++;
            }
        }
        resp.getWriter().print(n + " files uploaded.");
    }
}

Thanks in advance!

masm64
  • 1,222
  • 3
  • 14
  • 31
  • 1
    `String fileName = new String(file.getSubmittedFileName().getBytes("UTF-8"), "UTF-8");` is *exactly the same* as `String fileName = file.getSubmittedFileName();`. You're converting a String to bytes and then back to characters, all using the same charset, effectively an identity round-trip operation. I do find the use of `req.setCharacterEncoding("UTF-8")` suspicious, though. The HTTP request sent its own encoding for a reason; you shouldn't override that. – VGR Apr 02 '16 at 13:46
  • But I guess that encoding is not UTF-8? – masm64 Apr 02 '16 at 13:48
  • Probably not. A browser does not have to send data using the same charset that was used to encode the HTML. – VGR Apr 02 '16 at 13:56
  • How can I modify that behaviour? – masm64 Apr 02 '16 at 13:56
  • You shouldn't need to. If you remove the req.setCharacterEncoding("UTF-8") line, what does the submitted filename look like? – VGR Apr 02 '16 at 14:01
  • If I remove that line it looks like this: ááéé.txt – masm64 Apr 02 '16 at 14:07
  • I suspect a misconfigured server. If you're using Tomcat, you may want to look at http://stackoverflow.com/questions/16527576/httpservletrequest-utf-8-encoding . – VGR Apr 02 '16 at 14:27
  • I'm using Wildfly but my default encoding and file encoding seems to be UTF-8 when I look it up in the admin console. – masm64 Apr 02 '16 at 14:37
  • you should setup a proxy and see what exactly the request looks like on the wire, including what encoding headers are given and what the actual bytes are. – jtahlborn Apr 02 '16 at 14:54
  • I hope this helps http://oi66.tinypic.com/2r4lah2.jpg – masm64 Apr 02 '16 at 14:59
  • the problem seems to be that the upload request doesn't include a character set encoding and my guess is that it is not utf-8. – jtahlborn Apr 02 '16 at 16:23
  • Can I somehow make it include a charset? – masm64 Apr 02 '16 at 16:33
  • Is this helpful? http://stackoverflow.com/q/35413585 and http://stackoverflow.com/q/33941751 – BalusC Apr 02 '16 at 19:07

2 Answers2

2

Seems like the WildFly implementation doesn't use request's characted encoding. I found a solution:

String filename = new String(part.getSubmittedFileName().getBytes("ISO-8859-1"), "UTF-8");
Fruchtzwerg
  • 10,999
  • 12
  • 40
  • 49
maximwirt
  • 36
  • 3
0

req.setCharacterEncoding(...) sometimes does not work.

If you are using Tomcat, set URIEncoding="UTF-8" in Connector section in your server.xml, e.g.

<Connector port="80" protocol="HTTP/1.1" maxThreads="150" connectionTimeout="20000" enableLookups="false"
    URIEncoding="UTF-8" redirectPort="443" />

There may be a similar setting in Wildfly, I guess.

auntyellow
  • 2,423
  • 2
  • 20
  • 47