9

We store a bunch of weird document names on our web server (people upload them) that have various characters like spaces, ampersands, etc. When we generate links to these documents, we need to escape them so the server can look up the file by its raw name in the database. However, none of the built in .NET escape functions will work correctly in all cases.

Take the document Hello#There.docx:

UrlEncode will handle this correctly:

HttpUtility.UrlEncode("Hello#There");
"Hello%23There"

However, UrlEncode will not handle Hello There.docx correctly:

HttpUtility.UrlEncode("Hello There.docx");
"Hello+There.docx"

The + symbol is only valid for URL parameters, not document names. Interestingly enough, this actually works on the Visual Studio test web server but not on IIS.

The UrlPathEncode function works fine for spaces:

HttpUtility.UrlPathEncode("Hello There.docx");
"Hello%20There.docx"

However, it will not escape other characters such as the # character:

HttpUtility.UrlPathEncode("Hello#There.docx");
"Hello#There.docx"

This link is invalid as the # is interpreted as a URL hash and never even gets to the server.

Is there a .NET utility method to escape all non-alphanumeric characters in a document name, or would I have to write my own?

Mike Christensen
  • 88,082
  • 50
  • 208
  • 326
  • 1
    How do you handle dups if users upload files with the same name? Wouldn't it be easier to machine-generate the names (like a guid, for example) and store the friendly, user-supplied, name in the database (along with the generated file name)? – Kirk Woll Feb 22 '12 at 18:09
  • You probably need something like this [Remove Illegal Characters From Path and Filenames][1] [1]: http://stackoverflow.com/questions/146134/how-to-remove-illegal-characters-from-path-and-filenames – Agustin Meriles Feb 22 '12 at 18:10
  • @KirkWoll - Good question :) The URL *actually* looks like `/Docs/12345/My File.docx` - The 12345 is a unique key, but we want the IE "Save As" dialog to save the file with the same name as originally uploaded. We also verify the filename matches the key to prevent people from just guessing random documents (yea, not 100% secure but good enough).. – Mike Christensen Feb 22 '12 at 18:17
  • Not sure if "URL encoding" would be more appropriate than "escape" here since that's precisely what you're looking for. Then again it could create confusion with the `UrlEncode()` method which you mention. – BoltClock Feb 22 '12 at 18:19
  • @MikeChristensen: That should work just fine using a `content-disposition` HTTP header with `attachment;filename=My File.docx` – BrokenGlass Feb 22 '12 at 18:20
  • @BrokenGlass - Yea, we also set those HTTP response headers but I think there were some cases it wasn't working as we wanted. I can mess around with that route a bit more, but I also prefer the friendlier looking URLs if possible. – Mike Christensen Feb 22 '12 at 18:25

3 Answers3

15

Have a look at the Uri.EscapeDataString Method:

Uri.EscapeDataString("Hello There.docx")  // "Hello%20There.docx"

Uri.EscapeDataString("Hello#There.docx")  // "Hello%23There.docx"
dtb
  • 213,145
  • 36
  • 401
  • 431
  • Note if you have foreign characters, this will convert it as UTF8 escaped representation, in which case your users may still get funny file names depending on the application that opens the file. For example "Hélo.docx" (wich is displayed correctly by browsers), will become "H%C3%A9lo.docx". But this may be well enough in this case (and btw this is the same with UrlEncode), but if "user friendly" is a strong requirement, I suggest you check that as well. – Simon Mourier Feb 22 '12 at 18:26
  • +1 but, could you write a quick synopsis of when to use `UrlEncode` vs `UrlPathEncode` vs `EscapeDataString`? – BlueRaja - Danny Pflughoeft Feb 22 '12 at 19:56
6

I would approach it a different way: Do not use the document name as key in your look-up - use a Guid or some other id parameter that you can map to the document name on disk in your database. Not only would that guarantee uniqueness but you also would not have this problem of escaping in the first place.

BrokenGlass
  • 158,293
  • 28
  • 286
  • 335
  • 2
    Why can't you use a `content-disposition` HTTP header in your response? That should allow you to set the file name – BrokenGlass Feb 22 '12 at 18:22
0

You can use @ character to escape strings. See the below pieces of code.

string str = @"\n\n\n\n";
 Console.WriteLine(str);

Output: \n\n\n\n

string str1 = @"\df\%%^\^\)\t%%";
Console.WriteLine(str1);

Output: \df\%%^\^)\t%%

This kind of formatting is very useful for pathnames and for creating regexes.

ada
  • 321
  • 5
  • 12