How to get an "English name" for a character?

Question

I was just using this most helpful link: How do I check if a given string is a legal / valid file name under Windows?

And inside some validate code I have something that looks like (ignore the fact that I'm not using a StringBuilder class and ignore the bug in forming the message (don't need to tell them about 'Colon' more than once if it shows up in the string more than once)):

string InvalidFileNameChars = new string(Path.GetInvalidFileNameChars());
Regex ContainsABadChar = new Regex("[" + Regex.Escape(InvalidFileNameChars) + "]");

MatchCollection BadChars = ContainsABadChar.Matches(txtFileName.Text);
if (BadChars.Count > 0)
{
    string Msg = "The following invalid characters were detected:\r\n\r\n";
    foreach (Match Bad in BadChars)
    {
        Msg += Bad.Value + "\r\n";
    }
    MessageBox.Show(Msg, "Error", MessageBoxButtons.OK, MessageBoxIcon.Error);
    return;
}

That MessageBox will look something like (using the example that a colon was found):

-- begin --

The following invalid characters are detected:

:

-- end --

I'd like it to say something like:

-- begin --

The following invalid characters are detected:

Colon -> :

-- end --

I like having the english name. Not a killer, but was curious if there's some function out there like (which doesn't exist for the Char class, but may exist in some other class I'm not thinking of):

Char.GetEnglishName(':');

@DanielA.White I'm not sure if he wants Unicode, but that's a lot of characters. the Program Charmap does have that data. — McKay, Jan 19 '12 at 18:34
:) Char.GetEnglishName(':'); i wish there will be a method like this — Shoaib Shaikh, Jan 19 '12 at 18:34
But looking at the code he's adapting it from, but that only has like 6 invalid characters, so writing your own is not too hard. In any case. I think it's a good question, How do you get that data? — McKay, Jan 19 '12 at 18:36
i don't think there is any way in .net to do this.. you should try creating some method for this.. this is really easy.. — Shoaib Shaikh, Jan 19 '12 at 18:36
@DanielA.White - Well, not yet obviously. Why re-invent the wheel if it exists? I may be overlooking some class somewhere. Not exactly hard. Tedious. (Finding the english name for the 41 chars that GetInvalidFileNameChars returned, creating some map to do a lookup into; let's just hope that function doesn't return different results in different environments!) — JustLooking, Jan 19 '12 at 18:38
@ShoaibShaikh But maybe even better would be `Char.GetName(':', culture)` — McKay, Jan 19 '12 at 18:39
@McKay, in my testing it actually returns 41 characters! Err, I mean Values. — JustLooking, Jan 19 '12 at 18:41
@McKay absolutely!!!!!!! culture will solve the problem for all — Shoaib Shaikh, Jan 19 '12 at 18:42
@JustLooking Oh, I just guessed at how many are invalid. 41 is a bit more unweildy — McKay, Jan 19 '12 at 18:43
@McKay - No problem. I was surprised myself, or more like "oh yeah, forgot about these potential characters." — JustLooking, Jan 19 '12 at 18:56
Full list of characters and their (sometimes unfriendly) names can be found at [unicode.org](http://unicode.org/Public/UNIDATA/Scripts.txt). — user7116, Jan 19 '12 at 19:17
Possible duplicate of http://stackoverflow.com/questions/2087682/finding-out-unicode-character-name-in-net — Matthew Strawbridge, Jan 19 '12 at 21:15

score 9 · Accepted Answer · answered Jan 19 '12 at 19:10

You can just use the basic latin and controls unicode block if you don't need to account for every character, ever.

You can define the table as a simple string array to make lookups fast:

string[] lookup = new string[128];
lookup[0x00]="Null character";
lookup[0x01]="Start of Heading";
lookup[0x02]="Start of Text";
lookup[0x03]="End-of-text character";
lookup[0x04]="End-of-transmission character";
lookup[0x05]="Enquiry character";
lookup[0x06]="Acknowledge character";
lookup[0x07]="Bell character";
lookup[0x08]="Backspace";
lookup[0x09]="Horizontal tab";
lookup[0x0A]="Line feed";
lookup[0x0B]="Vertical tab";
lookup[0x0C]="Form feed";
lookup[0x0D]="Carriage return";
lookup[0x0E]="Shift Out";
lookup[0x0F]="Shift In";
lookup[0x10]="Data Link Escape";
lookup[0x11]="Device Control 1";
lookup[0x12]="Device Control 2";
lookup[0x13]="Device Control 3";
lookup[0x14]="Device Control 4";
lookup[0x15]="Negative-acknowledge character";
lookup[0x16]="Synchronous Idle";
lookup[0x17]="End of Transmission Block";
lookup[0x18]="Cancel character";
lookup[0x19]="End of Medium";
lookup[0x1A]="Substitute character";
lookup[0x1B]="Escape character";
lookup[0x1C]="File Separator";
lookup[0x1D]="Group Separator";
lookup[0x1E]="Record Separator";
lookup[0x1F]="Unit Separator";
lookup[0x20]="Space";
lookup[0x21]="Exclamation mark";
lookup[0x22]="Quotation mark";
lookup[0x23]="Number sign";
lookup[0x24]="Dollar sign";
lookup[0x25]="Percent sign";
lookup[0x26]="Ampersand";
lookup[0x27]="Apostrophe";
lookup[0x28]="Left parenthesis";
lookup[0x29]="Right parenthesis";
lookup[0x2A]="Asterisk";
lookup[0x2B]="Plus sign";
lookup[0x2C]="Comma";
lookup[0x2D]="Hyphen-minus";
lookup[0x2E]="Full stop";
lookup[0x2F]="Slash";
lookup[0x30]="Digit Zero";
lookup[0x31]="Digit One";
lookup[0x32]="Digit Two";
lookup[0x33]="Digit Three";
lookup[0x34]="Digit Four";
lookup[0x35]="Digit Five";
lookup[0x36]="Digit Six";
lookup[0x37]="Digit Seven";
lookup[0x38]="Digit Eight";
lookup[0x39]="Digit Nine";
lookup[0x3A]="Colon";
lookup[0x3B]="Semicolon";
lookup[0x3C]="Less-than sign";
lookup[0x3D]="Equal sign";
lookup[0x3E]="Greater-than sign";
lookup[0x3F]="Question mark";
lookup[0x40]="At sign";
lookup[0x41]="Latin Capital letter A";
lookup[0x42]="Latin Capital letter B";
lookup[0x43]="Latin Capital letter C";
lookup[0x44]="Latin Capital letter D";
lookup[0x45]="Latin Capital letter E";
lookup[0x46]="Latin Capital letter F";
lookup[0x47]="Latin Capital letter G";
lookup[0x48]="Latin Capital letter H";
lookup[0x49]="Latin Capital letter I";
lookup[0x4A]="Latin Capital letter J";
lookup[0x4B]="Latin Capital letter K";
lookup[0x4C]="Latin Capital letter L";
lookup[0x4D]="Latin Capital letter M";
lookup[0x4E]="Latin Capital letter N";
lookup[0x4F]="Latin Capital letter O";
lookup[0x50]="Latin Capital letter P";
lookup[0x51]="Latin Capital letter Q";
lookup[0x52]="Latin Capital letter R";
lookup[0x53]="Latin Capital letter S";
lookup[0x54]="Latin Capital letter T";
lookup[0x55]="Latin Capital letter U";
lookup[0x56]="Latin Capital letter V";
lookup[0x57]="Latin Capital letter W";
lookup[0x58]="Latin Capital letter X";
lookup[0x59]="Latin Capital letter Y";
lookup[0x5A]="Latin Capital letter Z";
lookup[0x5B]="Left Square Bracket";
lookup[0x5C]="Backslash";
lookup[0x5D]="Right Square Bracket";
lookup[0x5E]="Circumflex accent";
lookup[0x5F]="Low line";
lookup[0x60]="Grave accent";
lookup[0x61]="Latin Small Letter A";
lookup[0x62]="Latin Small Letter B";
lookup[0x63]="Latin Small Letter C";
lookup[0x64]="Latin Small Letter D";
lookup[0x65]="Latin Small Letter E";
lookup[0x66]="Latin Small Letter F";
lookup[0x67]="Latin Small Letter G";
lookup[0x68]="Latin Small Letter H";
lookup[0x69]="Latin Small Letter I";
lookup[0x6A]="Latin Small Letter J";
lookup[0x6B]="Latin Small Letter K";
lookup[0x6C]="Latin Small Letter L";
lookup[0x6D]="Latin Small Letter M";
lookup[0x6E]="Latin Small Letter N";
lookup[0x6F]="Latin Small Letter O";
lookup[0x70]="Latin Small Letter P";
lookup[0x71]="Latin Small Letter Q";
lookup[0x72]="Latin Small Letter R";
lookup[0x73]="Latin Small Letter S";
lookup[0x74]="Latin Small Letter T";
lookup[0x75]="Latin Small Letter U";
lookup[0x76]="Latin Small Letter V";
lookup[0x77]="Latin Small Letter W";
lookup[0x78]="Latin Small Letter X";
lookup[0x79]="Latin Small Letter Y";
lookup[0x7A]="Latin Small Letter Z";
lookup[0x7B]="Left Curly Bracket";
lookup[0x7C]="Vertical bar";
lookup[0x7D]="Right Curly Bracket";
lookup[0x7E]="Tilde";
lookup[0x7F]="Delete";

Then, all you need to do is:

var englishName = lookup[(int)'~'];

Or:

 public static string ToEnglishName(this char c)
 {
    int i = (int)c;
    if( i < lookup.Length )
       return lookup[i];
    return "Unknown";
 }

 var name = ':'.ToEnglishName(); // Colon

Awesome. So far it's coming down to you and plinth for the green check mark. I just up-voted. — JustLooking, Jan 19 '12 at 19:42
Although plinth answered first, you went the extra mile (cut-paste!). Green check mark for you! — JustLooking, Jan 20 '12 at 16:36

score 5 · Answer 2 · answered Jan 19 '12 at 18:37

5

The issue that you'll run into is that you need to be able to represent the Unicode space, which is going to be big. If you really want to do this, drop the contents of this page into a Dictionary then use this extension method on char:

public static string ToName(this char c)
{
    string result = ""; // or "unknown" or null or whatever
    _charToName.TryGetValue(c, out result);
    return result;
}

// ...

string name = c.ToName();

answered Jan 19 '12 at 18:37

plinth

48,267
11
78
120

3

There is no reason to initialize result. TryGetValue will set it to null if it does not find the key. – Fantius Jan 19 '12 at 18:42
Eeek! Maybe I'll just add the 41 and if "TryGetValue" doesn't return a result they'll get "Unknown" as the English name. :) – JustLooking Jan 19 '12 at 18:51
Up-Vote. Thanks. Still deciding on the green check-mark. You are in the running! – JustLooking Jan 19 '12 at 19:43

gilly3 · Answer 3 · 2012-01-19T19:46:38.140

4

I compiled a dictionary of character names that I gathered from various sources for a personal tool I made to search through unicode characters: http://jumpingfishes.com/unicodechars.htm

The dictionary is expressed as a JavaScript array and contains 20,761 definitions. Feel free to borrow my JavaScript to create a C# dictionary:
http://jumpingfishes.com/unicodeDescriptions.js

Edit: Better yet, here's the text file I used to generate my JavaScript. This might be a little easier source to parse for generating a C# dictionary. It contains the character code in hex followed by a tab followed by the character description.
http://jumpingfishes.com/unicodeDictionary.txt

edited Jan 19 '12 at 19:46

answered Jan 19 '12 at 19:23

gilly3

87,962
25
144
176

your tool would benefit from searching the decimal or hex of the character. – JJS Feb 23 '15 at 01:34
@jjs - I agree. I could have sworn I had that in there. – gilly3 Feb 23 '15 at 01:56
still awesome. have you seen amp-what.com? – JJS Feb 23 '15 at 02:15

score 3 · Answer 4 · edited May 23 '17 at 11:59

3

as mentioned in the answer to this question Finding out Unicode character name in .Net by @rik-hemsley

It's easier than ever now, as there's a package in nuget named Unicode Information

With this, you can just call:

UnicodeInfo.GetName(character)

edited May 23 '17 at 11:59

Community

1
1

answered Feb 23 '15 at 01:32

JJS

6,431
1
54
70

1

I was hoping this would work, but the library doesn't seem to ever return anything but NULL for control characters (eg CR, or ETX) – Mark W Feb 02 '16 at 20:27

How to get an "English name" for a character?

4 Answers4