I am trying to figure out an equivalent to C# string.IndexOf(string)
that can handle surrogate pairs in Unicode characters.
I am able to get the index when only comparing single characters, like in the code below:
public static int UnicodeIndexOf(this string input, string find)
{
return input.ToTextElements().ToList().IndexOf(find);
}
public static IEnumerable<string> ToTextElements(this string input)
{
var e = StringInfo.GetTextElementEnumerator(input);
while (e.MoveNext())
{
yield return e.GetTextElement();
}
}
But if I try to actually use a string as the find
variable then it won't work because each text element only contains a single character to compare against.
Are there any suggestions as to how to go about writing this?
Thanks for any and all help.
EDIT:
Below is an example of why this is necessary:
CODE
Console.WriteLine("HolyCowBUBBYYYYY".IndexOf("BUBB"));
Console.WriteLine("HolyCow@BUBBYY@YY@Y".IndexOf("BUBB"));
OUTPUT
9
8
Notice where I replace the character with
@
the values change.