17

We're using this code to generate requests and set the filename for the download:

var request = new GetPreSignedUrlRequest()
    .WithBucketName(S3BucketName)
    .WithExpires(requestExpirationTime)
    .WithKey(file.S3Key)
    .WithResponseHeaderOverrides(
        new ResponseHeaderOverrides()
            .WithContentDisposition("attachment; filename=\"Unicode FileName ᗩ Test.txt\""));

This generates the following link:

/s3path?AWSAccessKeyId=xxxx&Expires=1377199946&response-content-disposition=attachment%3B%20filename%3D"Unicode%20FileName%20ᗩ%20Test.txt"&Signature=xxxxx

Which gives this error:

<Error>
    <Code>InvalidArgument</Code>
    <Message>
        Header value cannot be represented using ISO-8859-1.
    </Message>
    <ArgumentValue>attachment; filename="Unicode ᗩ filename.txt"</ArgumentValue>
    <ArgumentName>response-content-disposition</ArgumentName>
    <RequestId>368BD60502854514</RequestId>
    <HostId>
        BiUUYp4d9iXfK68jKVxWZEp25m5je166M0ZY1VmoPk9pN9A69HLHcff6WIVLWk1B
    </HostId>
</Error>

How can we use non-ISO-8859-1 characters, such as unicode, in the response-content-disposition header?

CB-Dan
  • 1,718
  • 2
  • 16
  • 29

3 Answers3

9

I had this issue and I solved it by encoding the unicode string correctly.

I was in python boto land:

>>> import urllib
>>> encoded = urllib.quote('Unicode FileName ᗩ Test.txt')
>>> print encoded

"Unicode%20%E1%97%A9%20filename.txt"

Then, use this encoded string as the value for the response-content-disposition header.

In Java I believe you can achieve the same result with:

URLEncoder.encode(original_string, "UTF-8")

Hope this helps someone else at some point!

Alex Couper
  • 910
  • 9
  • 17
  • 1
    I found this function (in .net): System.Web.HttpUtility.UrlEncode(fileName, Encoding.UTF8). The problem is that it will also replace spaces with the + character, and it will also encode most non-letter characters such as ', which makes the downloaded file's name look messy. I found the perfect function for the job but it is sadly marked as internal (in HttpEncoder.cs) so it cannot be used directly without some hacks. // Helper to encode the non-ASCII url characters only internal String UrlEncodeNonAscii(string str, Encoding e) – CB-Dan Feb 21 '14 at 16:16
  • Tried this for java still same problem ```attachment; filename*=UTF-8''犬.jpg``` – mahfuj asif Apr 09 '20 at 10:27
  • we don't want Unicode%20%E1%97%A9%20filename.txt in the response, we want Unicode FileName ᗩ Test.txt, also this adds + signs in every space. Is there any way to prevent it? – shinzou Oct 31 '21 at 16:36
  • 1
    I just want to add that the result from the print would actually be: `"Unicode%20FileName%20%E1%97%A9%20Test.txt"` And that the more recent call Python 3+ would be: `encoded = urllib.parse.quote('Unicode FileName ᗩ Test.txt')` – Joseph Mar 02 '22 at 20:58
  • re the issue with replacing spaces and other characters. `urllib.parse.quote` accepts a keyword `safe` argument, which accepts a string containing characters it does not need to quote ie `urllib.parse.quote("Unicode FileName ᗩ Test.txt", safe=" ")` or any of the character sets from the `string` module if you want to go that route. – Zhenhir Dec 05 '22 at 02:00
6

As mentioned by this StackOverflow answer, There is no interoperable way to encode non-ASCII names in Content-Disposition. Browser compatibility is a mess.

The way we ended up doing it so that it works in all browsers is to replace all non-ISO-8859-1 characters by '-'. Here's the code:

private static readonly Encoding ContentDispositionHeaderEncoding = Encoding.GetEncoding("ISO-8859-1");

public static string GetWebSafeFileName(string fileName)
{
    // We need to convert the file name to ISO-8859-1 due to browser compatibility problems with the Content-Disposition Header (see: https://stackoverflow.com/a/216777/1038611)
    var webSafeFileName = Encoding.Convert(Encoding.Unicode, ContentDispositionHeaderEncoding, Encoding.Unicode.GetBytes(fileName));

    // Furthermore, any characters not supported by ISO-8859-1 will be replaced by « ? », which is not an acceptable file name character. So we replace these as well.
    return ContentDispositionHeaderEncoding.GetString(webSafeFileName).Replace('?', '-');
}

Following Alex Couper's answer, I found a way in .net to encode only non-ascii characters by calling an internal method in HttpEncoder

Calling internal functions is not recommended as they may change in future versions of the framework! Furthermore this will not work in all browsers as mentioned above. I'm leaving this here in case someone absolutely needs to do this.

var type = typeof(System.Web.Util.HttpEncoder);
var methodInfo = type.GetMethod("UrlEncodeNonAscii", BindingFlags.NonPublic | BindingFlags.Instance, null, new [] { typeof(string), typeof(Encoding) }, null);
object[] parameters = {fileName, Encoding.UTF8};

var encoder = new System.Web.Util.HttpEncoder();

var encodedFileName = (string) methodInfo.Invoke(encoder, parameters);
Community
  • 1
  • 1
CB-Dan
  • 1,718
  • 2
  • 16
  • 29
  • Oh wow, microsoft has this function done and is hiding it! You can get the original function in the source CS file [here](http://referencesource.microsoft.com/#System.Web/xsp/system/Web/Util/HttpEncoder.cs) If someone is able to make it work in vb.net that will be fine! I have no idea how to convert the "IntToHex((b >> 4) & 0xf);" part! (and online converters can't too) – foxontherock Oct 08 '14 at 14:37
  • 1
    Here's that part in VB.net: IntToHex((b >> 4) And &Hf) – CB-Dan Oct 08 '14 at 19:05
0

In Java, It's correct to encode the fileName in ContentDisposition but to avoid + in place of space in fileName , we can use replace on final encoded name to replace + with space again.

java.net.URLEncoder.encode(fileName, "UTF-8").replace("+", "%20");