9

I was just using this most helpful link: How do I check if a given string is a legal / valid file name under Windows?

And inside some validate code I have something that looks like (ignore the fact that I'm not using a StringBuilder class and ignore the bug in forming the message (don't need to tell them about 'Colon' more than once if it shows up in the string more than once)):

string InvalidFileNameChars = new string(Path.GetInvalidFileNameChars());
Regex ContainsABadChar = new Regex("[" + Regex.Escape(InvalidFileNameChars) + "]");

MatchCollection BadChars = ContainsABadChar.Matches(txtFileName.Text);
if (BadChars.Count > 0)
{
    string Msg = "The following invalid characters were detected:\r\n\r\n";
    foreach (Match Bad in BadChars)
    {
        Msg += Bad.Value + "\r\n";
    }
    MessageBox.Show(Msg, "Error", MessageBoxButtons.OK, MessageBoxIcon.Error);
    return;
}

That MessageBox will look something like (using the example that a colon was found):

-- begin --

The following invalid characters are detected:

:

-- end --

I'd like it to say something like:

-- begin --

The following invalid characters are detected:

Colon -> :

-- end --

I like having the english name. Not a killer, but was curious if there's some function out there like (which doesn't exist for the Char class, but may exist in some other class I'm not thinking of):

Char.GetEnglishName(':');

Community
  • 1
  • 1
JustLooking
  • 2,405
  • 4
  • 28
  • 38
  • 2
    have you tried to write your own function? – Daniel A. White Jan 19 '12 at 18:30
  • 1
    @DanielA.White I'm not sure if he wants Unicode, but that's a lot of characters. the Program Charmap does have that data. – McKay Jan 19 '12 at 18:34
  • 2
    :) Char.GetEnglishName(':'); i wish there will be a method like this – Shoaib Shaikh Jan 19 '12 at 18:34
  • But looking at the code he's adapting it from, but that only has like 6 invalid characters, so writing your own is not too hard. In any case. I think it's a good question, How do you get that data? – McKay Jan 19 '12 at 18:36
  • i don't think there is any way in .net to do this.. you should try creating some method for this.. this is really easy.. – Shoaib Shaikh Jan 19 '12 at 18:36
  • @DanielA.White - Well, not yet obviously. Why re-invent the wheel if it exists? I may be overlooking some class somewhere. Not exactly hard. Tedious. (Finding the english name for the 41 chars that GetInvalidFileNameChars returned, creating some map to do a lookup into; let's just hope that function doesn't return different results in different environments!) – JustLooking Jan 19 '12 at 18:38
  • 2
    @ShoaibShaikh But maybe even better would be `Char.GetName(':', culture)` – McKay Jan 19 '12 at 18:39
  • @McKay, in my testing it actually returns 41 characters! Err, I mean Values. – JustLooking Jan 19 '12 at 18:41
  • @McKay absolutely!!!!!!! culture will solve the problem for all – Shoaib Shaikh Jan 19 '12 at 18:42
  • @JustLooking Oh, I just guessed at how many are invalid. 41 is a bit more unweildy – McKay Jan 19 '12 at 18:43
  • @McKay - No problem. I was surprised myself, or more like "oh yeah, forgot about these potential characters." – JustLooking Jan 19 '12 at 18:56
  • Full list of characters and their (sometimes unfriendly) names can be found at [unicode.org](http://unicode.org/Public/UNIDATA/Scripts.txt). – user7116 Jan 19 '12 at 19:17
  • Possible duplicate of http://stackoverflow.com/questions/2087682/finding-out-unicode-character-name-in-net – Matthew Strawbridge Jan 19 '12 at 21:15

4 Answers4

9

You can just use the basic latin and controls unicode block if you don't need to account for every character, ever.

You can define the table as a simple string array to make lookups fast:

string[] lookup = new string[128];
lookup[0x00]="Null character";
lookup[0x01]="Start of Heading";
lookup[0x02]="Start of Text";
lookup[0x03]="End-of-text character";
lookup[0x04]="End-of-transmission character";
lookup[0x05]="Enquiry character";
lookup[0x06]="Acknowledge character";
lookup[0x07]="Bell character";
lookup[0x08]="Backspace";
lookup[0x09]="Horizontal tab";
lookup[0x0A]="Line feed";
lookup[0x0B]="Vertical tab";
lookup[0x0C]="Form feed";
lookup[0x0D]="Carriage return";
lookup[0x0E]="Shift Out";
lookup[0x0F]="Shift In";
lookup[0x10]="Data Link Escape";
lookup[0x11]="Device Control 1";
lookup[0x12]="Device Control 2";
lookup[0x13]="Device Control 3";
lookup[0x14]="Device Control 4";
lookup[0x15]="Negative-acknowledge character";
lookup[0x16]="Synchronous Idle";
lookup[0x17]="End of Transmission Block";
lookup[0x18]="Cancel character";
lookup[0x19]="End of Medium";
lookup[0x1A]="Substitute character";
lookup[0x1B]="Escape character";
lookup[0x1C]="File Separator";
lookup[0x1D]="Group Separator";
lookup[0x1E]="Record Separator";
lookup[0x1F]="Unit Separator";
lookup[0x20]="Space";
lookup[0x21]="Exclamation mark";
lookup[0x22]="Quotation mark";
lookup[0x23]="Number sign";
lookup[0x24]="Dollar sign";
lookup[0x25]="Percent sign";
lookup[0x26]="Ampersand";
lookup[0x27]="Apostrophe";
lookup[0x28]="Left parenthesis";
lookup[0x29]="Right parenthesis";
lookup[0x2A]="Asterisk";
lookup[0x2B]="Plus sign";
lookup[0x2C]="Comma";
lookup[0x2D]="Hyphen-minus";
lookup[0x2E]="Full stop";
lookup[0x2F]="Slash";
lookup[0x30]="Digit Zero";
lookup[0x31]="Digit One";
lookup[0x32]="Digit Two";
lookup[0x33]="Digit Three";
lookup[0x34]="Digit Four";
lookup[0x35]="Digit Five";
lookup[0x36]="Digit Six";
lookup[0x37]="Digit Seven";
lookup[0x38]="Digit Eight";
lookup[0x39]="Digit Nine";
lookup[0x3A]="Colon";
lookup[0x3B]="Semicolon";
lookup[0x3C]="Less-than sign";
lookup[0x3D]="Equal sign";
lookup[0x3E]="Greater-than sign";
lookup[0x3F]="Question mark";
lookup[0x40]="At sign";
lookup[0x41]="Latin Capital letter A";
lookup[0x42]="Latin Capital letter B";
lookup[0x43]="Latin Capital letter C";
lookup[0x44]="Latin Capital letter D";
lookup[0x45]="Latin Capital letter E";
lookup[0x46]="Latin Capital letter F";
lookup[0x47]="Latin Capital letter G";
lookup[0x48]="Latin Capital letter H";
lookup[0x49]="Latin Capital letter I";
lookup[0x4A]="Latin Capital letter J";
lookup[0x4B]="Latin Capital letter K";
lookup[0x4C]="Latin Capital letter L";
lookup[0x4D]="Latin Capital letter M";
lookup[0x4E]="Latin Capital letter N";
lookup[0x4F]="Latin Capital letter O";
lookup[0x50]="Latin Capital letter P";
lookup[0x51]="Latin Capital letter Q";
lookup[0x52]="Latin Capital letter R";
lookup[0x53]="Latin Capital letter S";
lookup[0x54]="Latin Capital letter T";
lookup[0x55]="Latin Capital letter U";
lookup[0x56]="Latin Capital letter V";
lookup[0x57]="Latin Capital letter W";
lookup[0x58]="Latin Capital letter X";
lookup[0x59]="Latin Capital letter Y";
lookup[0x5A]="Latin Capital letter Z";
lookup[0x5B]="Left Square Bracket";
lookup[0x5C]="Backslash";
lookup[0x5D]="Right Square Bracket";
lookup[0x5E]="Circumflex accent";
lookup[0x5F]="Low line";
lookup[0x60]="Grave accent";
lookup[0x61]="Latin Small Letter A";
lookup[0x62]="Latin Small Letter B";
lookup[0x63]="Latin Small Letter C";
lookup[0x64]="Latin Small Letter D";
lookup[0x65]="Latin Small Letter E";
lookup[0x66]="Latin Small Letter F";
lookup[0x67]="Latin Small Letter G";
lookup[0x68]="Latin Small Letter H";
lookup[0x69]="Latin Small Letter I";
lookup[0x6A]="Latin Small Letter J";
lookup[0x6B]="Latin Small Letter K";
lookup[0x6C]="Latin Small Letter L";
lookup[0x6D]="Latin Small Letter M";
lookup[0x6E]="Latin Small Letter N";
lookup[0x6F]="Latin Small Letter O";
lookup[0x70]="Latin Small Letter P";
lookup[0x71]="Latin Small Letter Q";
lookup[0x72]="Latin Small Letter R";
lookup[0x73]="Latin Small Letter S";
lookup[0x74]="Latin Small Letter T";
lookup[0x75]="Latin Small Letter U";
lookup[0x76]="Latin Small Letter V";
lookup[0x77]="Latin Small Letter W";
lookup[0x78]="Latin Small Letter X";
lookup[0x79]="Latin Small Letter Y";
lookup[0x7A]="Latin Small Letter Z";
lookup[0x7B]="Left Curly Bracket";
lookup[0x7C]="Vertical bar";
lookup[0x7D]="Right Curly Bracket";
lookup[0x7E]="Tilde";
lookup[0x7F]="Delete";

Then, all you need to do is:

var englishName = lookup[(int)'~'];

Or:

 public static string ToEnglishName(this char c)
 {
    int i = (int)c;
    if( i < lookup.Length )
       return lookup[i];
    return "Unknown";
 }

 var name = ':'.ToEnglishName(); // Colon
Ryan Emerle
  • 15,461
  • 8
  • 52
  • 69
5

The issue that you'll run into is that you need to be able to represent the Unicode space, which is going to be big. If you really want to do this, drop the contents of this page into a Dictionary then use this extension method on char:

public static string ToName(this char c)
{
    string result = ""; // or "unknown" or null or whatever
    _charToName.TryGetValue(c, out result);
    return result;
}

// ...

string name = c.ToName();
plinth
  • 48,267
  • 11
  • 78
  • 120
4

I compiled a dictionary of character names that I gathered from various sources for a personal tool I made to search through unicode characters: http://jumpingfishes.com/unicodechars.htm

The dictionary is expressed as a JavaScript array and contains 20,761 definitions. Feel free to borrow my JavaScript to create a C# dictionary:
http://jumpingfishes.com/unicodeDescriptions.js

Edit: Better yet, here's the text file I used to generate my JavaScript. This might be a little easier source to parse for generating a C# dictionary. It contains the character code in hex followed by a tab followed by the character description.
http://jumpingfishes.com/unicodeDictionary.txt

gilly3
  • 87,962
  • 25
  • 144
  • 176
3

as mentioned in the answer to this question Finding out Unicode character name in .Net by @rik-hemsley

It's easier than ever now, as there's a package in nuget named Unicode Information

With this, you can just call:

UnicodeInfo.GetName(character)
Community
  • 1
  • 1
JJS
  • 6,431
  • 1
  • 54
  • 70
  • 1
    I was hoping this would work, but the library doesn't seem to ever return anything but NULL for control characters (eg CR, or ETX) – Mark W Feb 02 '16 at 20:27