1

I am downloading a file from web and saving it on disk. That's easy, but some web files have characters in them that are not allowed when saving the file on disk. Most notably when I download a file such as script.js?something=somethingelse

I already found this post that deals with removing the illegal characters, but I don't want to do that. I want to somehow encode the filename, so that when I load the file from disk I can decode it and I will have the original name again.

I found this page describing which characters aren't allowed, but it's kind of vague, as it mentions

Any other character that the target file system does not allow.

What I came up with so far is to use HttpUtility.UrlEncode for this and it seems to be working alright. My fear is that it isn't 'right' and that it will fail in some scenarios. So does anyone know the "right" way of accomplishing this?

Community
  • 1
  • 1
Niels Brinch
  • 3,033
  • 9
  • 48
  • 75
  • The filename is "script.js". The question mark and the rest isn't part of the filename. – Hans Passant Sep 02 '12 at 12:48
  • I understand that, but I want to save the file exactly the way it was requested, so that I can recognize it again next time someone requests exactly the same - also I might want to support files like php, which changes it's content based on the querystring. – Niels Brinch Sep 02 '12 at 14:05

1 Answers1

0

HttpUtility.UrlEncode is more that good enough for most purposes. You also have System.Net.WebUtility.UrlEncode and UrlDecode, which have the advantage of not requiring a reference to the System.Web assembly (not part of the compact framework, for example) but it is only available in .Net 4.5.

However, if you are concerned about security, consider AntiXSS, which is part of the Microsoft Web Protection Library. If covers many more cases, such as escaping data for inclusion in javascript strings, and whitelist support.

akton
  • 14,148
  • 3
  • 43
  • 47
  • When you say that UrlEncode is more than good enough, does that mean that you know it covers every possible scenario that could otherwise be deemed an illegal character when saving to disk? – Niels Brinch Sep 04 '12 at 08:17
  • @NielsBrinch If all you are doing is encoding URLs, is should be fine. However, if you are doing lots of HTML work, you rarely need just URL encoding. Sorry if I wasn't clear. – akton Sep 04 '12 at 08:19
  • To clarify: I encode URL filenames (after the last slash) and save them on disk and characters such as "?" is not allowed in a file system... – Niels Brinch Sep 04 '12 at 08:28