4

From How to encode the filename parameter of Content-Disposition header in HTTP? I learnt that the encoding defined in RFC 5987 is used to encode filenames in Content-disposition headers. And from https://stackoverflow.com/a/1361646/739619 I learnt that support in major browsers is good at least since November 2012. Both questions are rather old, yet I couldn't find a standard way to encode filenames according to this encoding in python / tornado. I have a

self.set_header('Content-Disposition', 'attachment;filename="{}.{}"'.format(basename, format))

in my code that fails when basename contains characters outside latin1, and I am loking for a standard way to encode it.

1 Answers1

4

You can use urllib.parse.quote to encode. Just add the boilerplate of filename*=UTF-8''. For instance, this simple server serves a file with a UTF-8 filename:

import tornado.httpserver
import tornado.ioloop
import tornado.web

import urllib.parse

class MainHandler(tornado.web.RequestHandler):
    def get(self):
        filename = 'file "\'ä↭.txt'
        encoded_filename = urllib.parse.quote(filename, encoding='utf-8')
        self.set_header(
            'Content-Disposition',
            'attachment;filename*=UTF-8\'\'{}'.format(encoded_filename))
        self.write('text file with file name file "\'ä↭.txt.\n')
        self.write('Most browsers will encode the " as _ or so.')


application = tornado.web.Application([
    (r"/", MainHandler),
])
http_server = tornado.httpserver.HTTPServer(application)
http_server.listen(8888)
tornado.ioloop.IOLoop.current().start()
phihag
  • 278,196
  • 72
  • 453
  • 469
  • Might want to note that it is [RFC2183](https://tools.ietf.org/html/rfc2183) that restricts parameter values to US-ASCII. – Travis Gockel Mar 25 '18 at 22:51
  • ...well, it works in Firefox, while Chrome seems to ignore `filename*` and to accept `filename` with the encoded value. Also, it seems to me that the boilerplate `UTF-8''` is not needed, at least not in Chrome and in Firefox... – Francesco Marchetti-Stasi Mar 27 '18 at 05:42
  • Works fine for me in Chrome (v66.0.3359.45). If you remove the boilerplate, you're not using the new RFC 5987 behavior, but the old RFC 2616 one. – phihag Mar 27 '18 at 09:04
  • 1
    You are right, if I run a server with your code it works for me as well; if I run my complete code it doesn't. I am extending your example piece after piece to reach mine and see where it starts giving the problem, I didn't get there yet, but eventually I will :) – Francesco Marchetti-Stasi Mar 27 '18 at 21:17
  • 1
    Got it, it was a typo so stupid that it's not worth mentioning :) now it works as it should. Thanks again! – Francesco Marchetti-Stasi Mar 28 '18 at 21:11
  • This answer is insufficient. The characters encoded by `urllib.parse.quote` do not match the specs. For example, the urllib method will not escape `'` but the spec requires it. See `attr-char` from [RFC 5987 § 3.2.1](https://tools.ietf.org/html/rfc5987#section-3.2.1) for the FULL list of characters that should not be percent-encoded. Everything else should be. – AndrewF May 03 '19 at 16:34
  • @AndrewF: `quote()` encodes `'` too. Though [`safe=""` is necessary to encode `/` too](https://replit.com/@zed1/rfc5987). – jfs Aug 26 '22 at 07:27
  • @jfs This was my confusion based on the linked document, which says: "Letters, digits, and the characters `'_.-~'` are never quoted." I don't know why someone would choose to put quotes inside a code block in the context of documenting a list of characters. – AndrewF Aug 30 '22 at 16:49