.NET doesn't have a built-in method for iterating letters or character codes in the sense that you ask since they in a middle ground between the character encoding that .NET uses (UTF-16) and graphemes ("user-perceived characters").
UTF-16 encodes each Unicode codepoint in one or two code units (.NET's Char
, aliased in C# as char
). A String
(aliased in C# as string
) is a counted sequence of UTF-16 code units.
The Char
struct does have some methods that deal with codepoints (as Int32
) and some awkward ones that can help iterate codepoints. Note: codepoints are usually written with a U+ prefix and 4 or 5 hexadecimal digits.
The StringInfo
class has some methods that iterate graphemes (aka "text elements").
But, since you ask about Unicode character codes ("codepoints"), the UnicodeInformation NuGet package might be the best option.
With it, you can also get the description of each codepoint, as published by Unicode.org. Their website has a lot information, including complete lists of codepoints.
var s = "Put your repair hobby on your résumé.";
// takes two UTF-16 code units.
// Second é is two codepoints: "e\u0301", base and combining codepoints
var e = StringInfo.GetTextElementEnumerator(s);
while (e.MoveNext())
{
var grapheme = (String)e.Current;
Console.WriteLine(grapheme);
foreach (var codepoint in grapheme.AsCodePointEnumerable())
{
var info = UnicodeInfo.GetCharInfo(codepoint);
Console.WriteLine($" U+{codepoint:X04} {info.Name} {info.Category}");
}
}
Also, in case you are not aware, UTF-16 (or its forward-compatible precursor UCS-2) has been the native character encoding in many environments for approx 25 years: VB4/5/6/A/Script, Java, JavaScript, Windows API, NTFS, SQL NCHAR and NVARCAR, ….