2

On a asp.net web site, a user tried to upload a file as an email attachment that contained an emdash in the file name. When sending this as an email attachment (exchange server) the file got converted to _utf8_B_****.dat

So, on a .aspx page, I need to be able to detect if an emdash is present in the filename of a file that is uploaded as part of the Request.Files collection.

string s = "a—b-";

byte[] arr = Encoding.ASCII.GetBytes(s);
foreach (byte element in arr)
{
   Response.Write(element.ToString() + ",");
}

The string above has an emdash as the second character and a normal hyphen as the fourth character.

The code above prints 97,63,97,45 to the screen.

I assumed that as an emdash is not a valid ASCII character, either an error would be thrown or some indication shown that it was not a valid ASCII character. Yet it returns 63.

How can I detect an emdash in a file name so I can say to the user 'Your file name has an invalid character in it'? I have seen other questions on this issue, I can't get them to work.

Soner Gönül
  • 97,193
  • 102
  • 206
  • 364
Martin Smellworse
  • 1,702
  • 3
  • 28
  • 46
  • 2
    From http://www.asciitable.com/ you can see that 63 is the value for a question mark. When you call ASCII.GetBytes it forces the characters to ASCII, and uses a question mark when the character can't be converted. – David Jun 03 '13 at 13:37

2 Answers2

2

This should probably do the trick:

    foreach (char c in s) {
        if (c >= 128) {
            Response.Write("Non-ascii char detected: {0}", c);
        }
    }

I believe that Encoding.ASCII.GetBytes converts to ASCII first, so you should never see non-ASCII characters when you call that.

VeeTheSecond
  • 3,086
  • 3
  • 20
  • 16
  • [_"ASCII characters are limited to the lowest 128 Unicode characters, from U+0000 to U+007F. "_](http://msdn.microsoft.com/en-us/library/system.text.encoding.ascii.aspx). Your `if` condition will never be `true`. – CodeCaster Jun 03 '13 at 13:38
  • Good catch. I updated the condition with 128. By the way, the original test ( > 256) did catch the em dash from the example given. – VeeTheSecond Jun 03 '13 at 13:47
  • 1
    But it won't catch *&^%\: and other invalid (depending on the context) characters. Use framework-provided methods for work like this. – CodeCaster Jun 03 '13 at 13:49
2

How can I detect an emdash in a file name so I can say to the user 'Your file name has an invalid character in it'?

That's the wrong way around, because tomorrow a user will upload a file with another unicode character your filesystem or its API doesn't support. Besides you don't need ASCII, because NTFS can handle a lot more than 7 bytes per character.

The right question is: "What characters can I use to save a file"? But then again you'll be tied to the filesystem implementation. You'd best just generate a random filename and write the file to that path, and store the filename in a database so you can view the original filename.

If you do want to save the file under the user-provided path, you'll have to remove Path.GetInvalidPathChars() and Path.GetInvalidFileNameChars() from your the input.

If the problem is not the filesystem but the mail system, please show relevant code and error message.

Community
  • 1
  • 1
CodeCaster
  • 147,647
  • 23
  • 218
  • 272
  • The problem is not the file system. The file is saved on the server okay. But when the mail server sends that file, it converts it to a .dat. Change the file name so that the emdash is a normal hyphen, and the mail server sends it fine. There is no error code - it just converts the file to a .dat file and sends it. – Martin Smellworse Jun 03 '13 at 14:23