(NB: I am not a Unicode guru.)
The issue may be that the character U+0329 is a combining character. If you replace the }}
in the string with alphabetic characters you'll see the issue a little more clearly:
"the̩"
Note that in this case U+0329 has appeared as a modifier on the e
in the string, altering the character's visual representation. The same is happening, albeit strangely, with the }}
character pair.
What you have is not a pair of RIGHT CURLY BRACKET characters, it's one RIGHT CURLY BRACKET and one RIGHT CURLY BRACKET + COMBINING VERTICAL LINE BELOW glyph. When you compare this using the default StringComparison
option then the second RIGHT CURLY BRACKET in your search string doesn't match because it is being matched against a glyph (sequence of characters) that is different.
Using StringComparison.Ordinal
however changes the way that the Unicode is processed, ignoring the COMBINING attribute of U+0329 and simply comparing the code points.
Where this gets interesting is when you're searching for accented characters, since there are often multiple glyphs that look the same and compare the same when using CurrentCulture
or InvariantCulture
but are composed differently.
Take for example the LATIN SMALL LETTER E WITH ACUTE "é
" code point U+00E9 compared to the glyph LATIN SMALL LETTER E + COMBINING ACUTE ACCENT (U+0065, U+0301) "é
". They look the same and compare the same when you use InvariantCulture
but not when you use Ordinal
comparison.
You can do a lot of work just trying to define what is the 'right' way to handle this, let alone implementing that way. I often find it's better to simply accept that some character sequences are going to give you trouble and pick one StringComparison
value to go with.