3

I have implemented a simple file upload-download mechanism. When a user clicks a file name, the file is downloaded with these HTTP headers:

HTTP/1.1 200 OK
Date: Tue, 30 Sep 2008 14:00:39 GMT
Server: Microsoft-IIS/6.0
Content-Disposition: attachment; filename=filename.doc;
Content-Type: application/octet-stream
Content-Length: 10754

I also support Japanese file names. In order to do that, I encode the file name with this java method:

private String encodeFileName(String name) throws Exception{
    String agent = request.getHeader("USER-AGENT");
    if(agent != null && agent.indexOf("MSIE") != -1){ // is IE
        StringBuffer res = new StringBuffer();
        char[] chArr = name.toCharArray();
        for(int j = 0; j < chArr.length; j++){
            if(chArr[j] < 128){ // plain ASCII char
                if (chArr[j] == '.' && j != name.lastIndexOf("."))
                    res.append("%2E");
                else
                    res.append(chArr[j]);
            }
            else{ // non-ASCII char
                byte[] byteArr = name.substring(j, j + 1).getBytes("UTF8");
                for(int i = 0; i < byteArr.length; i++){
                    // byte must be converted to unsigned int
                    res.append("%").append(Integer.toHexString((byteArr[i]) & 0xFF));
                }
            }
        }
        return res.toString();
    }
    // Firefox/Mozilla
    return MimeUtility.encodeText(name, "UTF8", "B");
}

It worked well so far, until someone found out that it doesn't work well with long file names. For example: あああああああああああああああ2008.10.1あ.doc. If I change one of the single-byte dots to a single-byte underline , or if I remove the first character, it works OK. i.e., it depends on length and URL-encoding of a dot character. Following are a few examples.

This is broken (あああああああああああああああ2008.10.1あ.doc):

Content-Disposition: attachment; filename=%e3%81%82%e3%81%82%e3%81%82%e3%81%82%e3%81%82%e3%81%82%e3%81%82%e3%81%82%e3%81%82%e3%81%82%e3%81%82%e3%81%82%e3%81%82%e3%81%82%e3%81%822008%2E10%2E1%e3%81%82.doc;

This is OK (あああああああああああああああ2008_10.1あ.doc):

Content-Disposition: attachment; filename=%e3%81%82%e3%81%82%e3%81%82%e3%81%82%e3%81%82%e3%81%82%e3%81%82%e3%81%82%e3%81%82%e3%81%82%e3%81%82%e3%81%82%e3%81%82%e3%81%82%e3%81%822008_10%2E1%e3%81%82.doc;

This is also fine (あああああああああああああああ2008.10.1あ.doc):

Content-Disposition: attachment; filename=%e3%81%82%e3%81%82%e3%81%82%e3%81%82%e3%81%82%e3%81%82%e3%81%82%e3%81%82%e3%81%82%e3%81%82%e3%81%82%e3%81%82%e3%81%82%e3%81%822008%2E10%2E1%e3%81%82.doc;

Anybody have a clue?

bignose
  • 30,281
  • 14
  • 77
  • 110
Ovesh
  • 5,209
  • 11
  • 53
  • 73

4 Answers4

6

gmail handles file name escaping somewhat differently: the file name is quoted (double-quotes), and single-byte periods are not URL-escaped. This way, the long file name in the question is OK.

Content-Disposition: attachment; filename="%E3%81%82%E3%81%82%E3%81%82%E3%81%82%E3%81%82%E3%81%82%E3%81%82%E3%81%82%E3%81%82%E3%81%82%E3%81%82%E3%81%82%E3%81%82%E3%81%82%E3%81%822008.10.1%E3%81%82.doc"

However, there is still a limitation (apparently IE-only) on the byte-length of the file name (a bug, I assume). So even if the file name is made of only single-byte characters, the beginning of the file name is truncated. The limitation is around 160 bytes.

Ovesh
  • 5,209
  • 11
  • 53
  • 73
  • Congrats! Sometimes the best answer one can receive is no answer at all, this forces us to look again at the problem - and it's far more rewarding when you yourself solve it ;) – Joe Pineda Sep 30 '08 at 15:49
  • See also https://stackoverflow.com/q/93551/3995261 (the `*=UTF-8''` bit) – YakovL Feb 04 '18 at 13:36
2

As mentioned above, Content-Disposition and Unicode is impossible to get working all main browsers without browser sniffing and returning different headers for each.

My solution was to avoid the Content-Disposition header entirely, and append the filename to the end of the URL to trick the browser into thinking it was getting a file directly. e.g.

http://www.xyz.com/cgi-bin/dynamic.php/あああああああああああああああ2008.10.1あ.doc

This naturally assumes that you know the filename when you create the link, although a quick redirect header could set it on demand.

Gavin Brock
  • 5,027
  • 1
  • 30
  • 33
1

The main issue here is that IE does not support the relevant RFC, here: RFC2231. See pointers and test cases. Furthermore, the workaround that you use for IE (just using percent-escaped UTF-8) has several additional problems; it may not work in all locales (as far as I recall, the method fails in Korea unless IE is configured to always use UTF-8 in URLs which is not the default), and, as previously mentioned, there are length limits (I hear that that is fixed in IE8, but I did not try yet).

Julian Reschke
  • 40,156
  • 8
  • 95
  • 98
-2

I think this issue is fixed in IE8, I have seen it working in IE 8.

Wouter J
  • 41,455
  • 15
  • 107
  • 112
hardik
  • 1
  • 2
    This really should be a comment on the answer, not an answer because it's not heplful to those who need compatibility with IE browsers before 8. – Fls'Zen Dec 14 '12 at 21:32