0

My website allows users to upload files with any name. Some names, of course, will have non-ASCII characters. When a user uploads a file, I save it in a folder with its original name. However, when I try to download it, by accessing its location (for example, files/Tolstoy - How much land does a man need?.pdf), I get a 404. Is there some way to solve this, so that the files remain with their original name? Via Apache, maybe?

Sophivorus
  • 3,008
  • 3
  • 33
  • 43

3 Answers3

1

Um, just use url encoding, known also as percent encoding? that's meant to handle the urls in web. All urls printed to HTML should be url encoded.

For PHP, rawurlencode should be used, as it should be standards-compliant, which urlencode isn't.

Edit: for this issue

PHP encodes "é" as "e%26%23769%3B", instead of "e%CC%81"

e%CC%81 would be UTF-8 for . e%26%23769%3B would be for é, which is an HTML entity for the same. This means that you're doing either explicit htmlentities() call there before urlencoding, or your server setup does that automatically. It's not strictly needed if proper character sets are in place (only htmlspecialchars call is actually needed), but it shouldn't break anything either.

Some online tools if you want to test these out:

eis
  • 51,991
  • 13
  • 150
  • 199
  • Sorry the delay, I had to leave the computer. Using `urlencode` was also my first idea, but I can't get it to work. Here is a link to what I'm doing, might help: http://filechan.net/ – Sophivorus Feb 07 '13 at 23:37
  • For some reason, PHP encodes "é" as "e%26%23769%3B", instead of "e%CC%81". This may point to where the problem is. – Sophivorus Feb 07 '13 at 23:56
  • urlencode() is something proprietary to php, rawurlencode is what you should use, since that one at least tries to be standard-compliant. – eis Feb 08 '13 at 09:24
  • @FelipeSchenone added an explanation for that – eis Feb 08 '13 at 09:33
0

Workaround: convert filenames to ASCII at upload. You will be happy with it.

ern0
  • 3,074
  • 25
  • 40
  • That should work, but I'd prefer to keep the names untouched. Is that possible? – Sophivorus Feb 07 '13 at 23:38
  • You may trick with *content-disposition* header. You can tell any file name you like with it. http://stackoverflow.com/questions/1012437/uses-of-content-disposition-in-an-http-response-header – ern0 Feb 08 '13 at 08:57
0

Well, for some reason that I still don't understand, using rawurlencode() instead of urlencode() made it work.

However, the character é (among others, I'm sure) is still being encoded strangely (e%26%23769%3B instead of simply %C3%A9). Even stranger is that the links containing it work.

Sophivorus
  • 3,008
  • 3
  • 33
  • 43
  • Using `rawurlencode` solved my problem, so this is the answer that actually helped me out. However, after some edits, eis's answer is now more accurate and complete, so I chose it as the correct one. – Sophivorus Feb 09 '13 at 03:58