10

I am writing a simple file download servlet and I can't get correct filenames. Tried URLEncoding and MimeEncoding the filename as seen in existing answers, but none of them worked.

The fileData object in the following snippet contains the mime type, the byte[] content and the filename, that needs at least ISO-8859-2 charset, ISO-8859-1 is not enough.

How can I get my browser to display the downloaded filename correctly?

Here is an example of the filename: árvíztűrőtükörfúrógép.xls and it results in: árvíztqrptükörfúrógép.xls

  protected void renderMergedOutputModel(Map model, HttpServletRequest req, HttpServletResponse res) throws Exception {

    RateDocument fileData = (RateDocument) model.get("command.retval");
    OutputStream out = res.getOutputStream();
    if(fileData != null) {
        res.setContentType(fileData.getMime());
        String enc = "utf-8"; //tried also: ISO-8859-2

        String encodedFileName = fileData.getName();
            // also tried URLencoding and mime encoding this filename without success

        res.setCharacterEncoding(enc); //tried with and without this
        res.setHeader("Content-Disposition", "attachment; filename=" + encodedFileName);
        res.setContentLength(fileData.getBody().length);
        out.write(fileData.getBody());
    } else {
        res.setContentType("text/html");
        out.write("<html><head></head><body>Error downloading file</body></html>"
                .getBytes(res.getCharacterEncoding()));
    }
    out.flush();
  }
jabal
  • 11,987
  • 12
  • 51
  • 99
  • Please give some examples of how file names look and what you get instead. – BalusC Mar 16 '11 at 12:53
  • árvíztűrőtükörfúrógép.xls --> árvíztqrptükörfúrógép.xls – jabal Mar 16 '11 at 13:15
  • 1
    Yes, you are right. These two characters are not in ISO-8859-1 only in ISO-8859-2, causing many problems for every Hungarian developer.. :-) – jabal Mar 16 '11 at 13:28

6 Answers6

20

I found out solution that works in all browsers I have installed (IE8, FF16, Opera12, Chrome22).
It's based on the fact, that browsers expect value in filename parameter, that is encoded in browsers native encoding, if no [different] encoding is specified.

Usually browser's native encoding is utf-8 (FireFox, Opera, Chrome). But IE's native encoding is Win-1250.

So if we put value into filename parametr, that is encoded by utf-8/win-1250 according to user's browser, it should work. At least, it works for me.

String fileName = "árvíztűrőtükörfúrógép.xls";

String userAgent = request.getHeader("user-agent");
boolean isInternetExplorer = (userAgent.indexOf("MSIE") > -1);

try {
    byte[] fileNameBytes = fileName.getBytes((isInternetExplorer) ? ("windows-1250") : ("utf-8"));
    String dispositionFileName = "";
    for (byte b: fileNameBytes) dispositionFileName += (char)(b & 0xff);

    String disposition = "attachment; filename=\"" + dispositionFileName + "\"";
    response.setHeader("Content-disposition", disposition);
} catch(UnsupportedEncodingException ence) {
    // ... handle exception ...
}

Of course, this is tested only on browsers mentioned above and I cannot guarante on 100% that this will work in any browser all time.

Note #1 (@fallen): It's not correct to use URLEncoder.encode() method. Despite method's name, it doesn't encode string into URL-encoding, but it does encode into form-encoding. (Form-encoding is quite similiar to URL-encoding and in a lot of cases it produces same results. But there are some differences. For example space character ' ' is encoded different: '+' instead of '%20')

For correct URL-encoded string you should use URI class:

URI uri = new URI(null, null, "árvíztűrőtükörfúrógép.xls", null);
System.out.println(uri.toASCIIString());
sporak
  • 496
  • 5
  • 10
  • I think you'll still have issues if your filename contains " but otherwise this is awesome - thanks! – teedyay Dec 18 '12 at 12:21
  • 3
    IE's native encoding is Central/Eastern European code page? You must be joking. The only thing it shows is that IE use the local browser's system locale. Sadly, I do not think there is a reliable way to detect it from the server. – Yongwei Wu Feb 27 '13 at 05:32
  • 1
    Why does this work? If the original `fileName` is just a single character, for example `ő`, then `fileName.getBytes("UTF-8")` will return a byte array with two elements `0xC5 0x91`. The above solution loops over these two bytes and appends them to a new string. This new string will be two *characters* long and four *bytes* long. What the heck? By the way it works, but I can't wrap my head around why. – Kohányi Róbert Jun 02 '15 at 13:42
3

Based on the great answers given here, I have developed an extended version which I have put into production already. Based on RFC 5987 and this test suite.

String filename = "freaky-multibyte-chars";
StringBuilder contentDisposition = new StringBuilder("attachment");
CharsetEncoder enc = StandardCharsets.US_ASCII.newEncoder();
boolean canEncode = enc.canEncode(filename);
if (canEncode) {
    contentDisposition.append("; filename=").append('"').append(filename).append('"');
} else {
    enc.onMalformedInput(CodingErrorAction.IGNORE);
    enc.onUnmappableCharacter(CodingErrorAction.IGNORE);

    String normalizedFilename = Normalizer.normalize(filename, Form.NFKD);
    CharBuffer cbuf = CharBuffer.wrap(normalizedFilename);

    ByteBuffer bbuf;
    try {
        bbuf = enc.encode(cbuf);
    } catch (CharacterCodingException e) {
        bbuf = ByteBuffer.allocate(0);
    }

    String encodedFilename = new String(bbuf.array(), bbuf.position(), bbuf.limit(),
            StandardCharsets.US_ASCII);

    if (StringUtils.isNotEmpty(encodedFilename)) {
        contentDisposition.append("; filename=").append('"').append(encodedFilename)
                .append('"');
    }

    URI uri;
    try {
        uri = new URI(null, null, filename, null);
    } catch (URISyntaxException e) {
        uri = null;
    }

    if (uri != null) {
        contentDisposition.append("; filename*=UTF-8''").append(uri.toASCIIString());
    }

}
Michael-O
  • 18,123
  • 6
  • 55
  • 121
3

Unfortunately, it depends on the browser. See this topic of discussion this problem. To solve your problem, look at this site with examples of different headers and their behavior in diffrent browsers.

Community
  • 1
  • 1
ilalex
  • 3,018
  • 2
  • 24
  • 37
1

I have recently solved this issue in my application. here is the solution for firefox only, it sadly fails on IE.

response.addHeader("Content-Disposition", "attachment; filename*='UTF-8'" + URLEncoder.encode("árvíztűrőtükörfúrógép", "UTF-8") + ".xls");

fallen
  • 11
  • 1
  • thank you, but I am still looking for the ultimate solution.. :-) Currently I change every ű to u and ő to o in filenames, this is better than ? marks. – jabal May 10 '11 at 08:09
  • Could anyone tell me what is the result in case i use safari 5.1.7. I am having the same issues. the above code is working well on firefox, chrome and IE but its not working on safari. – vermaraj Jul 31 '14 at 09:45
0
private void setContentHeader(HttpServletResponse response, String userAgent, String fileName) throws UnsupportedEncodingException {
    fileName = URLEncoder.encode(fileName, "UTF-8");
    boolean isFirefox = (userAgent.indexOf("Firefox") > -1);
    if (isFirefox) {
        response.setHeader(HttpHeaders.CONTENT_DISPOSITION, "attachment; filename*=UTF-8''" + fileName);
    } else {
        response.setHeader(HttpHeaders.CONTENT_DISPOSITION, "attachment; filename=" + fileName);
    }
}
reznic
  • 672
  • 6
  • 9
0

Summing all I read so far this works for me:


    URI uri = new URI( null, null, fileName, null);
    String fileNameEnc = uri.toASCIIString(); //URL encoded.
    String contDisp = String.format( "attachment; filename=\"%s\";filename*=utf-8''%s", fileName, fileNameEnc);
    response.setHeader( "Content-disposition", contDisp);