One difference between form C and form D is how letters with accents are represented: form C uses a single letter-with-accent codepoint, while form D separates that into a letter and an accent.
For instance, an "à" can be codepoint 224 ("Latin small letter A with grave"), or codepoint 97 ("Latin small letter A") followed by codepoint 786 ("Combining grave accent"). A char-by-char comparison would see these as different. Normalisation lets the comparison succeed.
A side-effect is that this makes it possible to easily create a "remove accents" method.
public static string RemoveAccents(string input)
{
return new string(input
.Normalize(System.Text.NormalizationForm.FormD)
.Where(c => CharUnicodeInfo.GetUnicodeCategory(c) != UnicodeCategory.NonSpacingMark)
.ToArray());
// the normalization to FormD splits accented letters in letters+accents
// the rest removes those accents (and other non-spacing characters)
// and creates a new string from the remaining chars
}
Or have the "highly secure" ROT13 encoding work with accents:
string Rot13(string input)
{
var v = input.Normalize(NormalizationForm.FormD)
.Select(c => {
if ((c>='a' && c<='m') || (c>='A' && c<='M'))
return (char)(c+13);
if ((c>='n' && c<='z') || (c>='N' && c<='Z'))
return (char)(c-13);
return c;
});
return new String(v.ToArray()).Normalize(NormalizationForm.FormC);
}
This will turn "Crème brûlée" into "Per̀zr oeĥyŕr" (and vice versa, of course), by first splitting "character with accent" codepoints in separate "character" and "accent" codepoints (FormD), then performing the ROT13 translation on just the letters and afterwards trying to recombine them (FormC).