4

I have a list of strangely encoded files: 02 - Charlie, Woody and You/Study #22.mp3 which I suppose isn't so bad but there are a few particular characters which Django OR nginx seem to be snagging on.

>>> test = u'02 - Charlie, Woody and You/Study #22.mp3'
>>> test
u'02 - Charlie, Woody and You\uff0fStudy #22.mp3'

I am using nginx as a reverse proxy to connect to django's built in webserver (still in development stages) and postgresql for my database. My database and tables are all en_US.UTF-8 and I am using pgadmin3 to view my tables outside of django. My issue goes a little beyond my title, firstly how should I be saving possibly whacky filenames in my database? My current method is

'path': smart_unicode(path.lstrip(MUSIC_PATH)),
'filename': smart_unicode(file)

and when I pprint out the values they do show u'whateverthecrap'

I am not sure if that is how I should be doing it but assuming it is now I have issues trying to spit out the download.

My download view looks something like this:

def song_download(request, song_id):
    song = get_object_or_404(Song, pk=song_id)
    url = u'/static_music/%s/%s' % (song.path, song.filename)

    print url

    response = HttpResponse()
    response['X-Accel-Redirect'] = url
    response['Content-Type'] = 'audio/mpeg'
    response['Content-Disposition'] = "attachment; filename=test.mp3"

    return response

and most files will download but when I get to 02 - Charlie, Woody and You/Study #22.mp3 I receive this from django: 'ascii' codec can't encode character u'\uff0f' in position 118: ordinal not in range(128), HTTP response headers must be in US-ASCII format.

How can I use an ASCII acceptable string if my filename is out of bounds? 02 - Charlie, Woody and You\uff0fStudy #22.mp3 doesn't seem to work...

EDIT 1

I am using Ubuntu for my OS.

TheLizardKing
  • 2,014
  • 3
  • 20
  • 27
  • 1
    It's not clear to me if your forward slash character `/` is meant here to be part of the filename or the parent directory plus filename? (Because, of course, the forward slash character '/' is not allowed as part of a filename in most modern filesystems, due to the confusion with the directory structure.) Nevertheless, to encode the forward slash in an ASCII-safe way, you could use '\u002f'... but I wouldn't recommend it. – ewall Apr 28 '10 at 19:05
  • It is more of a delimiter in this song, I am not sure how they got it in but it's not a true `/`, it's a `/` which is probably why it's allowed. This was probably a poor example but it has done it on many other unicoded characters. – TheLizardKing Apr 28 '10 at 19:25

1 Answers1

8

Although is an unusual and undesirable character, your script will break for any non-ASCII character.

response['X-Accel-Redirect'] = url

url is Unicode (and it isn't a URL, it's a filepath). Response headers are bytes. You'll need to encode it.

response['X-Accel-Redirect'] = url.encode('utf-8')

that's assuming you're running on a server with UTF-8 as the filesystem encoding.

(Now, how to encode the filename in the Content-Disposition header... that's an altogether trickier question!)

bobince
  • 528,062
  • 107
  • 651
  • 834
  • Haha sorry, url USE to be a url. I'm running Ubuntu 10.4 (Beta), is there a way to tell my filesystem's encoding? – TheLizardKing Apr 30 '10 at 16:54
  • Yeah, unless you've changed it, Ubuntu will be UTF-8. Every modern OS except Windows uses UTF-8. – bobince Apr 30 '10 at 18:39
  • 1
    According to [this answer](http://stackoverflow.com/a/20933751/484127) `filename` parameter must always be ascii, e.g: `filename=test.encode('ascii', 'replace')`, while newer browsers (following [RFC-6266](http://tools.ietf.org/html/rfc6266)) can use `filename*`, e.g: `filename*=''urlquote(test)`. – tutuDajuju Jul 06 '15 at 13:15