1

I am trying to figure out an equivalent to C# string.IndexOf(string) that can handle surrogate pairs in Unicode characters.

I am able to get the index when only comparing single characters, like in the code below:

    public static int UnicodeIndexOf(this string input, string find)
    {
        return input.ToTextElements().ToList().IndexOf(find);
    }

    public static IEnumerable<string> ToTextElements(this string input)
    {
        var e = StringInfo.GetTextElementEnumerator(input);
        while (e.MoveNext())
        {
            yield return e.GetTextElement();
        }
    }

But if I try to actually use a string as the find variable then it won't work because each text element only contains a single character to compare against.

Are there any suggestions as to how to go about writing this?

Thanks for any and all help.

EDIT:

Below is an example of why this is necessary:

CODE

 Console.WriteLine("HolyCowBUBBYYYYY".IndexOf("BUBB"));
 Console.WriteLine("HolyCow@BUBBYY@YY@Y".IndexOf("BUBB"));

OUTPUT

9
8

Notice where I replace the character with @ the values change.

Ibrennan208
  • 1,345
  • 3
  • 14
  • 31
  • use the same encoding for both string and you are good – Steve May 04 '18 at 20:28
  • @Steve I added some information to my question. Are those strings the same encoding or is there a difference? – Ibrennan208 May 04 '18 at 20:39
  • @Ibrennan208, from your initial implementation it looks like you are trying to find a *single* grapheme, because you are using an `IndexOf` on an array of strings that are in effect `TextElements`, but from your sample data it looks like you actually want to find an index of a substring with length > 1 grapheme. Can you specify which solution you are seeking? (Just run your code on your test data - it won't work - indexOf will return -1) – ironstone13 May 04 '18 at 20:48
  • 1
    @ironstone13 I want to find an index of a substring with length > 1. In the question I explained that I can get it to work if I am only comparing a string with a single character, but I want to extend it to allow for the user to input a multicharacter string to find the index of. – Ibrennan208 May 04 '18 at 20:58

1 Answers1

3

You basically want to find index of one string array in another string array. We can adapt code from this question for that:

public static class Extensions {
    public static int UnicodeIndexOf(this string input, string find, StringComparison comparison = StringComparison.CurrentCulture) {
        return IndexOf(
           // split input by code points
           input.ToTextElements().ToArray(),
           // split searched value by code points
           find.ToTextElements().ToArray(), 
           comparison);
    }
    // code from another answer
    private static int IndexOf(string[] haystack, string[] needle, StringComparison comparision) {
        var len = needle.Length;
        var limit = haystack.Length - len;
        for (var i = 0; i <= limit; i++) {
            var k = 0;
            for (; k < len; k++) {
                if (!String.Equals(needle[k], haystack[i + k], comparision)) break;
            }

            if (k == len) return i;
        }

        return -1;
    }

    public static IEnumerable<string> ToTextElements(this string input) {
        var e = StringInfo.GetTextElementEnumerator(input);
        while (e.MoveNext()) {
            yield return e.GetTextElement();
        }
    }
}
Evk
  • 98,527
  • 8
  • 141
  • 191