-1

"* {{IPA|ɪntɹəvaɪtl}}̩".IndexOf("}}") returns -1.

"* {{IPA|ɪntɹəvaɪtl}}".IndexOf("}}") returns 18.

I expect the first sample to return 18.

Notice there is a special character U+0329 following the ending double brackets in the first sample.

Combining Vertical Line Below U+0329

Why is it returning -1 instead of 18? Even in an ordinal search, the string contains "}}", so it should not return -1.

live-love
  • 48,840
  • 22
  • 240
  • 204
  • 1
    I wonder if it's culture related. What if you try `"foo".IndexOf("", StringComparison.InvariantCulture);` – Evan Trimboli Feb 04 '18 at 23:45
  • I tried, it doesn't work. – live-love Feb 04 '18 at 23:49
  • -1 means it didn't find a match and if there is an extra hidden character in the search string that isn't in the string you are searching then a match will not be found. – juharr Feb 04 '18 at 23:52
  • Yes, I already know that. The hidden character follows the closing brackets, so it should return 18. – live-love Feb 04 '18 at 23:52
  • The hidden character comes after }} – live-love Feb 04 '18 at 23:53
  • Evan, your comment helped, I tried StringComparison.Ordinal and it worked as expected. Thanks. – live-love Feb 04 '18 at 23:59
  • 1
    For anyone interested there's some good info in this answer: https://stackoverflow.com/questions/492799/difference-between-invariantculture-and-ordinal-string-comparison – Evan Trimboli Feb 05 '18 at 00:02
  • 1
    Possible duplicate of [Difference between InvariantCulture and Ordinal string comparison](https://stackoverflow.com/questions/492799/difference-between-invariantculture-and-ordinal-string-comparison) – Dour High Arch Feb 05 '18 at 00:36
  • Not a duplicate. Not the same question. Answer works, but my question still remains, why does IndexOf returns -1 when there is {{ in the search string, being it's an ordinal search? – live-love Feb 05 '18 at 01:01
  • “Duplicate” means the answer is a duplicate, not the question. U+0329 is a combining character, it changes the preceding character; `"}" + "\u0329"` is a single glyph different from "}", it is not a string of two glyphs. – Dour High Arch Feb 05 '18 at 01:54
  • The link above mentions nothing about glyphs. But thanks for the explanation. Your comment answers my question. Keep flagging away, your marshal badge is almost there! – live-love Feb 05 '18 at 02:02

1 Answers1

2

(NB: I am not a Unicode guru.)

The issue may be that the character U+0329 is a combining character. If you replace the }} in the string with alphabetic characters you'll see the issue a little more clearly:

"the̩"

Note that in this case U+0329 has appeared as a modifier on the e in the string, altering the character's visual representation. The same is happening, albeit strangely, with the }} character pair.

What you have is not a pair of RIGHT CURLY BRACKET characters, it's one RIGHT CURLY BRACKET and one RIGHT CURLY BRACKET + COMBINING VERTICAL LINE BELOW glyph. When you compare this using the default StringComparison option then the second RIGHT CURLY BRACKET in your search string doesn't match because it is being matched against a glyph (sequence of characters) that is different.

Using StringComparison.Ordinal however changes the way that the Unicode is processed, ignoring the COMBINING attribute of U+0329 and simply comparing the code points.


Where this gets interesting is when you're searching for accented characters, since there are often multiple glyphs that look the same and compare the same when using CurrentCulture or InvariantCulture but are composed differently.

Take for example the LATIN SMALL LETTER E WITH ACUTE "é" code point U+00E9 compared to the glyph LATIN SMALL LETTER E + COMBINING ACUTE ACCENT (U+0065, U+0301) "". They look the same and compare the same when you use InvariantCulture but not when you use Ordinal comparison.

You can do a lot of work just trying to define what is the 'right' way to handle this, let alone implementing that way. I often find it's better to simply accept that some character sequences are going to give you trouble and pick one StringComparison value to go with.

Corey
  • 15,524
  • 2
  • 35
  • 68