11

First of all, I'm pretty sure this is not a duplicate because I've been researching this topic for quite some time, both on StackOverflow and elsewhere. Similar questions have been asked, but none were answered satisfactorily.

Related (but not identical) questions from the past:

I'm also fully aware of mod_rewrite tricks that make it completely unnecessary to juggle filenames in HTTP headers. But let's suppose that this is not an option.


Most modern browsers (IE9+, Firefox, Chrome) support RFC2231/5987 when downloading files with non-ASCII characters in their names. In those cases, the following PHP code works like a charm:

header("Content-Disposition: attachment; " .
       "filename*=UTF-8''" . rawurlencode($filename));

IE <= 8 doesn't understand RFC2231/5987, but the following code works most of the time. Since every browser has tried to emulate IE to some extent, this also works in many other browsers, such as Firefox.

header("Content-Disposition: attachment; " .
       'filename="' . rawurlencode($filename) . '"');

Meanwhile, Chrome < 11 and Safari < 6 seem to prefer the following, despite the fact that it places non-ASCII characters directly in the header.

header("Content-Disposition: attachment; filename=" . $filename);

So far so good.


But everything falls apart when it comes to Android's default browser app. (So far, I've tested this in Gingerbread, Ice Cream Sandwich and Jelly Bean.)

If you give it the standard RFC2231/5987 treatment, the default browser completely ignores it and tries to guess the filename from the last part of the URL.

If you give it the usual non-standard (IE <= 8) treatment, either the default browser tries to interpret the filename as ISO-8859-1, leading to an unintelligible jumble of characters, or it silently discards all non-ASCII characters. The exact behavior differs between versions, but in any case it is clear that Android's default browser was not designed to support the rawurlencode() format, either.

The same thing happens if you put the raw filename in the header.

This is usually not an issue with third-party browsers, such as Firefox for Android, Dolphin Browser, and Boat Browser. The default browser app is the only one that consistently fails to understand UTF-8 filenames.


Perhaps this was finally fixed in a recent version of Android, or perhaps it will be fixed in the next version. But that's not my question. I need this to work in existing devices, and there are still millions of Gingerbread and ICS devices out there.

I've read the bug reports, I've read the complaints, I've read pretty much everything there is to read about this problem. So far I have been unable to find any encoding scheme that actually works.

If anyone knows how to encode a non-ASCII filename** (e.g. файла파일ファイル名.jpg) in a Content-Disposition **header and have the Android default browser recognize it, please share it! I don't care how hacky or non-standard it is. I don't care if it needs to be customized for each version of Android.

Update

Unfortunately, so far I have not received any answer that actually solves the problem mentioned above. So the bounty expires unclaimed. Please don't answer unless you actually know how to encode non-European, mixed-language filenames in a way that is recognized by Android Browser prior to ICS, or if you have solid evidence that this is impossible.

Community
  • 1
  • 1
kijin
  • 8,702
  • 2
  • 26
  • 32
  • can you tell me if you want that to work only with your app or for the user to use it regularly? In versions previous to 4.0.1 the content-disposition is discarded. – Diogo Bento Apr 05 '14 at 01:12
  • 1
    Since Android is open source, maybe you can look into Android's source code and find out what the exact treatment of filenames is. Then you may also discover how to work around it. – Gerben Apr 05 '14 at 01:14
  • @DiogoBento This is not for an app, it's for a website. So it needs to be compatible with a wide range of both desktop and mobile browsers. The default Browser app doesn't exactly *discard* the Content-Disposition, it only acts weird when there are non-ASCII characters in the filename. – kijin Apr 06 '14 at 07:43

3 Answers3

3

URLUtil.java is responsible for guessFileName which calls parseContentDisposition which uses this regular expression "attachment;\\s*filename\\s*=\\s*(\"?)([^\"]*)\\1\\s*$".

to get the filename of the file based on the Content-Disposition header.

The source code below which attempts to replicate the parseContentDisposition functionality, works correctly when I tested it. E.g It returns файла파일ファイル名.jpg.

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class HelloWorld{

     public static void main(String []args){
     String contentDisposition = "Content-Disposition: attachment; " + " filename=" +"\"файла파일ファイル名.jpg\"";     
     Pattern CONTENT_DISPOSITION_PATTERN = Pattern.compile("attachment;\\s*filename\\s*=\\s*(\"?)([^\"]*)\\1\\s*$",Pattern.CASE_INSENSITIVE);
        try {
            Matcher m = CONTENT_DISPOSITION_PATTERN.matcher(contentDisposition);
            if (m.find()) {
                System.out.println("Result: " + m.group(2));
            }
        } catch (IllegalStateException ex) {
             // This function is defined as returning null when it can't parse the header
        }

     }
}
Appleman1234
  • 15,946
  • 45
  • 67
  • That code means that the "old Chrome and Safari" encoding method (third example in the question) should work. Unfortunately, real-life Android Browser seems to behave differently, either ignoring the filename altogether (Gingerbread) or discarding all non-ASCII characters (Jelly Bean). – kijin Apr 06 '14 at 07:40
0

Sadly I cannot test it right now but I use the following code to send file to the browsers and so far I hadn't any problem (even with android's browsers) hope it can help you:

    $formatRFC2231 = 'filename*=UTF-8\'\'%s';
    $formatDef = 'filename="%s"';

    switch ($useragent) {
        case "Opera":
        case "Firefox":
            $filename = rawurlencode($name);
            $format = $formatRFC2231;
            break;
        case "IE":
        case "Safari":
        case "Chrome":
            $filename = rawurlencode($name);
            $format = $formatDef;
            break;
        default:
            $filename = iconv("UTF-8", "ISO-8859-1//TRANSLIT", $name);
            $format = $formatDef;
            break;
    }

The key point here is the iconv function that converts any non-utf8 character in an ISO-8859-1 one

Antonio E.
  • 4,381
  • 2
  • 25
  • 35
  • 1
    `//TRANSLIT` may work for accented Latin characters like ãèù, but I don't think it's possible to transliterate 中文, 한글 or にほんご into ISO-8859-1. (I tried and only got a bunch of question marks. Guess what, question marks are invalid in filenames.) – kijin Apr 06 '14 at 13:51
  • What about trying to find the right iso-* (i think is around cp25x but i cannot remember right now) with the iconv function and convert the filename with that? – Antonio E. Apr 10 '14 at 21:50
  • Did you read my question? I linked to that thread at the top of my question. I know the relevant RFCs, I'm just looking for a workaround for a specific user agent that doesn't respect the RFCs. As I mentioned in the question, I'm also aware of how to use `mod_rewrite` to put the filename in the URL itself, but I'm looking for a solution that doesn't involve URL rewriting. Thank you for trying to help, but if you really want to help, please read the question carefully and tell me something I don't already know. – kijin Apr 11 '14 at 00:27
  • There's CP932 for Japanese, CP949 for Korean, CP936 and CP950 for Chinese, and CP874 for Thai, but UTF-* are the only encodings that can encode mixed-language filenames (which are surprisingly common in some parts of Asia), even with //TRANSLIT. Very few Korean characters, for example, look even remotely similar to a Japanese character. – kijin Apr 11 '14 at 00:45
0

I believe you are encountering a bug in the Android download manager, as described here:

https://code.google.com/p/chromium/issues/detail?id=162333

Julian Reschke
  • 40,156
  • 8
  • 95
  • 98