More efficient way to get all indexes of a character in a string

Question

Instead of looping through each character to see if it's the one you want then adding the index your on to a list like so:

     var foundIndexes = new List<int>();
     for (int i = 0; i < myStr.Length; i++)
     {
        if (myStr[i] == 'a')
           foundIndexes.Add(i);
     }

There is no faster way to do it for one character; however, if you are searching for longer patterns, different algorithms exist that allow you to skip more than one character at a time like the [Boyer–Moore string search algorithm](http://en.wikipedia.org/wiki/Boyer%E2%80%93Moore_string_search_algorithm). — Olivier Jacot-Descombes, Oct 07 '12 at 15:37

score 33 · Accepted Answer · edited Sep 19 '18 at 07:48

33

You can use String.IndexOf, see example below:

    string s = "abcabcabcabcabc";
    var foundIndexes = new List<int>();

    long t1 = DateTime.Now.Ticks;
    for (int i = s.IndexOf('a'); i > -1; i = s.IndexOf('a', i + 1))
        {
         // for loop end when i=-1 ('a' not found)
                foundIndexes.Add(i);
        }
    long t2 = DateTime.Now.Ticks - t1; // read this value to see the run time

edited Sep 19 '18 at 07:48

KaraKaplanKhan

734
1
14
31

answered Oct 07 '12 at 03:29

m.zam

447
4
6

+1 While agree with @cdhowie comment, It is another option to try if fighting for last bit of performance. I'm not sure if JIted `for` will always remove boundaries checks that are not present in IndexOf, may be faster for long string with rare occurences of the character. – Alexei Levenkov Oct 07 '12 at 03:53
your comments is base on assumption. i've tested. my solution is much more faster. test it to believe it. compare tick value before and after of the two solutions. i've edited my answer to show you how i test it – m.zam Oct 07 '12 at 15:27
sorry, i have done few more test. it actually depends what is your combination, it can be faster and some time it can also be slower. but anyway, i think my code is cleaner. – m.zam Oct 07 '12 at 15:42
zam, this did in fact decrease the time by about 30%. Thank you! – Mark W Oct 16 '12 at 14:17
+1 very nice, this is good even if you're searching for string and not single character – Fabio Marcolini Sep 19 '13 at 19:34
just asking; but wouldn't it be faster and more straightforward instead of calling IndexOf for every character just iterate every character in the string? – CyberFox Nov 22 '17 at 03:36
2

I would just point to use `StopWatch` instead deducting ticks on the "benchmark" side – Gonzo345 Dec 18 '19 at 07:24

fubo · Answer 2 · 2019-11-17T14:41:40.273

27

I use the following extension method to yield all results:

public static IEnumerable<int> AllIndexesOf(this string str, string searchstring)
{
    int minIndex = str.IndexOf(searchstring);
    while (minIndex != -1)
    {
        yield return minIndex;
        minIndex = str.IndexOf(searchstring, minIndex + searchstring.Length);
    }
}

usage:

IEnumerable<int> result = "foobar".AllIndexesOf("o"); // [1,2]

Side note to a edge case: This is a string approach which works for one or more characters. In case of "fooo".AllIndexesOf("oo") the result is just 1 https://dotnetfiddle.net/CPC7D2

edited Nov 17 '19 at 14:41

answered Jun 03 '14 at 13:11

fubo

44,811
17
103
137

1

Hey fubo. It might be more correct to just use `+ 1 ` in place of `+ searchstring.Length`. Consider the example of `AllIndexesOf("11111", "11")`. This returns `(0, 2)`, because it searches from the end of the original `11` at index 0, and then from index 2 onwards. The actual answer should be `(0, 1, 2, 3)` as it is possible to find `11` from all of these indexes. Here is my fork of your .NET Fiddle which demonstrates the updated code, which produces the "correct" answer, depending on your world view: https://dotnetfiddle.net/RGQWd8. – Aaron Newton Dec 25 '19 at 05:57
@AaronNewton right, that's my side note and which result is correct depends on the requirements – fubo Dec 27 '19 at 07:25

score 13 · Answer 3 · answered Oct 07 '12 at 03:14

13

How about

string xx = "The quick brown fox jumps over the lazy dog";

char search = 'f';

var result = xx.Select((b, i) => b.Equals(search) ? i : -1).Where(i => i != -1);

answered Oct 07 '12 at 03:14

Nikhil Agrawal

47,018
22
121
208

Hardly more efficient algorithmically, you're just using LINQ to do a the same thing. Unless there is some LINQ magic going that combines the two searches (I'm not intimately familiar with LINQ optimizations) on you have actually introduced a second loop and made the performance worse. – Ed S. Oct 07 '12 at 03:16
@EdS.: No LINQ does delayed execution in it. Its actually a single loop that will work. – Nikhil Agrawal Oct 07 '12 at 03:18
2

Single loop or not, it's not faster and it just convolutes what should be a simple piece of code. – Ed S. Oct 07 '12 at 03:19
1

just tested in my parallel algorithm here: http://stackoverflow.com/questions/12765329/plinq-query-need-to-know-how-many-iterations-performed, much slower – Mark W Oct 07 '12 at 03:22
3

LINQ will almost always be slower than doing the same thing imperatively. LINQ-to-objects relies on delegates, which are effectively function pointers. The pointer will be invoked for each element in the source enumerable. A function pointer call *cannot* be optimized by the C# compiler, the JIT, or the host CPU. They are much slower than inline code in every practical way. (I'm not saying never use LINQ, but be aware that you are trading performance for readability and maintainability -- assuming your LINQ code is readable and maintainable.) – cdhowie Oct 07 '12 at 03:24
3

Quite the shitstorm for posting a one-liner. Any decent .NET programmer knows that using linq is a performance tradeoff like @cdhowie stated. I will not be using this solution as i my piece is more performance critical. For a onetimer this is perfect. And thanks to poster for actually putting it here for our convenience – CyberFox Nov 22 '17 at 03:24

score 7 · Answer 4 · answered Oct 07 '12 at 03:23

The raw iteration is always better & most optimized.

Unless it's a bit complex task, you never really need to seek for a better optimized solution...

So I would suggest to continue with :

var foundIndexes = new List<int>();

for (int i = 0; i < myStr.Length; i++)

     if (myStr[i] == 'a') foundIndexes.Add(i);

score 5 · Answer 5 · answered Oct 07 '12 at 03:22

If the string is short, it may be more efficient to search the string once and count up the number of times the character appears, then allocate an array of that size and search the string a second time, recording the indexes in the array. This will skip any list re-allocations.

What it comes down to is how long the string is and how many times the character appears. If the string is long and the character appears few times, searching it once and appending indicies to a List<int> will be faster. If the character appears many times, then searching the string twice (once to count, and once to fill an array) may be faster. Exactly where the tipping point is depends on many factors that can't be deduced from your question.

If you need to search the string for multiple different characters and get a list of indexes for those characters separately, it may be faster to search through the string once and build a Dictionary<char, List<int>> (or a List<List<int>> using character offsets from \0 as the indicies into the outer array).

Ultimately, you should benchmark your application to find bottlenecks. Often the code that we think will perform slowly is actually very fast, and we spend most of our time blocking on I/O or user input.

+1, I was writing up something similar, but I don't have time to run any benchmarks, so I decided not to. — Ed S., Oct 07 '12 at 03:23
howie, if you don't mind my asking, what would you use to benchmark the app? I don't have premium VS so I can't use Analyze, only use Professional edition. — Mark W, Oct 07 '12 at 03:54

score 0 · Answer 6 · answered Dec 10 '19 at 16:05

    public static List<int> GetSubstringLocations(string text, string searchsequence)
    {
        try
        {
            List<int> foundIndexes = new List<int> { };

            int i = 0;
            while (i < text.Length)
            {
                int cindex = text.IndexOf(searchsequence, i);

                if (cindex >= 0)
                {
                    foundIndexes.Add(cindex);
                    i = cindex;
                }

                i++;

            }

            return foundIndexes;
        }
        catch (Exception ex) { }
        return new List<int> { };
    }

score 0 · Answer 7 · answered Jul 10 '20 at 11:34

public static String[] Split(this string s,char c = '\t')
    {
        if (s == null) return null;
        var a = new List<int>();
        int i = s.IndexOf(c);
        if (i < 0) return new string[] { s };
        a.Add(i);
        for (i = i+1; i < s.Length; i++) if (s[i] == c) a.Add(i);
        var result = new string[a.Count +1];            
        int startIndex = 0;
        result[0] = s.Remove(a[0]);
        for(i=0;i<a.Count-1;i++)
        {
            result[i + 1] = s.Substring(a[i] + 1, a[i + 1] - a[i] - 1);
        }
        result[a.Count] = s.Substring(a[a.Count - 1] + 1);
        return result;
    }

score 0 · Answer 8 · answered Jul 05 '23 at 15:04

Another solution with less code:

public static int[] IndexOfAll(this string source, char target)
{
    return source.Select((c, idx) => c == target ? idx : -1).Where(idx => idx != -1).ToArray();
}

usage:

var result = "foobar".IndexOfAll('o'); // [1,2]

More efficient way to get all indexes of a character in a string

8 Answers8

Linked