106

Unless I am missing an obvious built-in method, what is the quickest way to get the nth occurrence of a string within a string?

I realize that I could loop the IndexOf method by updating its start index on each iteration of the loop. But doing it this way seems wasteful to me.

DavidRR
  • 18,291
  • 25
  • 109
  • 191
PeteT
  • 18,754
  • 26
  • 95
  • 132
  • Similar: http://stackoverflow.com/a/9908392/1305911 – JNF Oct 24 '12 at 10:58
  • I would use a regular expressions for that then you have to optimal way of matching the string within the string. This in one of the beautiful DSLs we all should use when possible. [An example](http://www.regular-expressions.info/dotnet.html "Link") in VB.net the code is almost the same in C#. – bovium Oct 09 '08 at 11:09
  • 4
    I would place good money on the regular expressions version being significantly harder to get right than "keep looping and doing simple String.IndexOf". Regular expressions have their place, but shouldn't be used when simpler alternatives exist. – Jon Skeet Oct 09 '08 at 12:06
  • In fact it's the total opposite! Using IndexOf is actually an order of magnitude (or two) faster than a regular expression based solution and even faster than counting the occurrences iterating of the chars on the string. I did a complete benchmark of the methods and will be posting here as an answer for future reference! – Loudenvier Dec 30 '22 at 17:30

11 Answers11

110

You really could use the regular expression /((s).*?){n}/ to search for n-th occurrence of substring s.

In C# it might look like this:

public static class StringExtender
{
    public static int NthIndexOf(this string target, string value, int n)
    {
        Match m = Regex.Match(target, "((" + Regex.Escape(value) + ").*?){" + n + "}");

        if (m.Success)
            return m.Groups[2].Captures[n - 1].Index;
        else
            return -1;
    }
}

Note: I have added Regex.Escape to original solution to allow searching characters which have special meaning to regex engine.

whyleee
  • 4,019
  • 1
  • 31
  • 33
Alexander Prokofyev
  • 33,874
  • 33
  • 95
  • 118
  • 2
    should you be escaping the `value`? In my case I was looking for a dot http://msdn.microsoft.com/en-us/library/system.text.regularexpressions.regex.escape.aspx – russau Jun 06 '11 at 06:11
  • 3
    This Regex does not work if the target string contains linebreaks. Could you fix it? Thanks. – Ignacio Soler Garcia Sep 01 '11 at 11:25
  • Seems to lock if there isn't an Nth match. I needed to limit a comma separated value to 1000 values, and this hung when the csv had less. So @Yogesh -- probably not a great accepted answer as is. ;) Using a variant of [this answer](http://stackoverflow.com/a/6004505/1028230) (there's a string to string version [here](http://stackoverflow.com/a/11773674/1028230)) and [changed the loop to stop at nth count](http://pastebin.com/w6aPDn3x) instead. – ruffin Oct 11 '12 at 14:44
  • Trying to search on \, value passed in is "\\", and the match string looks like this before the regex.match function: ((\).*?){2}. I get this error: parsing "((\).*?){2}" - Not enough )'s. What is the correct format to look for back slashes without an error? – RichieMN Feb 19 '14 at 16:53
  • @eMi, I have slightly modified code to allow searching special characters. – Alexander Prokofyev Jan 14 '15 at 11:24
  • 5
    Sorry but a minor criticism: regex solutions are suboptimal, because then I have to relearn regexs for the nth time. The code is essentially more difficult to read when regexes are used. – Mark Rogers May 28 '15 at 17:27
54

That's basically what you need to do - or at least, it's the easiest solution. All you'd be "wasting" is the cost of n method invocations - you won't actually be checking any case twice, if you think about it. (IndexOf will return as soon as it finds the match, and you'll keep going from where it left off.)

Jon Skeet
  • 1,421,763
  • 867
  • 9,128
  • 9,194
  • 3
    I suppose your right, it does seem like there should be a built in method though, i'm sure it's a commmon occurrence. – PeteT Oct 09 '08 at 10:48
  • 4
    Really? I can't remember ever having to do it in about 13 years of Java and C# development. That doesn't mean I've really never had to do it - but just not often enough to remember. – Jon Skeet Oct 09 '08 at 11:01
  • Speaking of Java, we have `StringUtils.ordinalIndexOf()`. C# with all the Linq and other wonderful features, just doesn't have a built-in support for this. And yes it is very imperative to have its support if you are dealing with parsers and tokenizers. – Annie Mar 21 '14 at 10:22
  • 5
    @Annie: You say "we have" - do you mean in Apache Commons? If so, you can write your own third party library for .NET just as easily as you can for Java... so it's not like that's something the Java standard library has that .NET doesn't. And of course in C# you can add it as an extension method on `string` :) – Jon Skeet Mar 21 '14 at 10:24
20

That's basically what you need to do - or at least, it's the easiest solution. All you'd be "wasting" is the cost of n method invocations - you won't actually be checking any case twice, if you think about it. (IndexOf will return as soon as it finds the match, and you'll keep going from where it left off.)

Here is the recursive implementation (of the above idea) as an extension method, mimicing the format of the framework method(s):

public static int IndexOfNth(this string input,
                             string value, int startIndex, int nth)
{
    if (nth < 1)
        throw new NotSupportedException("Param 'nth' must be greater than 0!");
    if (nth == 1)
        return input.IndexOf(value, startIndex);
    var idx = input.IndexOf(value, startIndex);
    if (idx == -1)
        return -1;
    return input.IndexOfNth(value, idx + 1, --nth);
}

Also, here are some (MBUnit) unit tests that might help you (to prove it is correct):

using System;
using MbUnit.Framework;

namespace IndexOfNthTest
{
    [TestFixture]
    public class Tests
    {
        //has 4 instances of the 
        private const string Input = "TestTest";
        private const string Token = "Test";

        /* Test for 0th index */

        [Test]
        public void TestZero()
        {
            Assert.Throws<NotSupportedException>(
                () => Input.IndexOfNth(Token, 0, 0));
        }

        /* Test the two standard cases (1st and 2nd) */

        [Test]
        public void TestFirst()
        {
            Assert.AreEqual(0, Input.IndexOfNth("Test", 0, 1));
        }

        [Test]
        public void TestSecond()
        {
            Assert.AreEqual(4, Input.IndexOfNth("Test", 0, 2));
        }

        /* Test the 'out of bounds' case */

        [Test]
        public void TestThird()
        {
            Assert.AreEqual(-1, Input.IndexOfNth("Test", 0, 3));
        }

        /* Test the offset case (in and out of bounds) */

        [Test]
        public void TestFirstWithOneOffset()
        {
            Assert.AreEqual(4, Input.IndexOfNth("Test", 4, 1));
        }

        [Test]
        public void TestFirstWithTwoOffsets()
        {
            Assert.AreEqual(-1, Input.IndexOfNth("Test", 8, 1));
        }
    }
}
Tod Thomson
  • 4,773
  • 2
  • 33
  • 33
15
private int IndexOfOccurence(string s, string match, int occurence)
{
    int i = 1;
    int index = 0;

    while (i <= occurence && (index = s.IndexOf(match, index + 1)) != -1)
    {
        if (i == occurence)
            return index;

        i++;
    }

    return -1;
}

or in C# with extension methods

public static int IndexOfOccurence(this string s, string match, int occurence)
{
    int i = 1;
    int index = 0;

    while (i <= occurence && (index = s.IndexOf(match, index + 1)) != -1)
    {
        if (i == occurence)
            return index;

        i++;
    }

    return -1;
}
Schotime
  • 15,707
  • 10
  • 46
  • 75
  • 5
    If I'm not mistaken, this method fails if the string to match starts at position 0, which can be corrected by setting `index` initially to -1. – Peter Majeed Jun 27 '12 at 15:59
  • 1
    You may also want to check for null or empty strings s and match or it will throw but thats a design decision. –  Jan 21 '15 at 23:35
  • Thanks @PeterMajeed - if `"BOB".IndexOf("B")` returns 0, so should this function for `IndexOfOccurence("BOB", "B", 1)` – PeterX Feb 17 '15 at 03:09
  • 2
    Yours is probably the ultimate solution since it has both an extension function and it avoids regexs and recursion, both of which make code less readable. – Mark Rogers May 28 '15 at 17:29
  • @tdyen Indeed, Code Analysis will issue ["CA1062: Validate arguments of public methods"](https://msdn.microsoft.com/en-us/library/ms182182.aspx) if `IndexOfOccurence` does not check if `s` is `null`. And [String.IndexOf (String, Int32)](https://msdn.microsoft.com/en-us/library/7cct0x33(v=vs.110).aspx) will throw `ArgumentNullException` if `match` is `null`. – DavidRR Jan 12 '17 at 15:34
2

After some benchmarking, this seems to be the simplest and most effcient solution

public static int IndexOfNthSB(string input,
             char value, int startIndex, int nth)
        {
            if (nth < 1)
                throw new NotSupportedException("Param 'nth' must be greater than 0!");
            var nResult = 0;
            for (int i = startIndex; i < input.Length; i++)
            {
                if (input[i] == value)
                    nResult++;
                if (nResult == nth)
                    return i;
            }
            return -1;
        }
emsimpson92
  • 1,779
  • 1
  • 9
  • 24
ShadowBeast
  • 135
  • 1
  • 2
  • 6
2

Here I go again! Another benchmark answer from yours truly :-) Once again based on the fantastic BenchmarkDotNet package (if you're serious about benchmarking dotnet code, please, please use this package).

The motivation for this post is two fold: PeteT (who asked it originally) wondered that it seems wasteful to use String.IndexOf varying the startIndex parameter in a loop to find the nth occurrence of a character while, in fact, it's the fastest method, and because some answers uses regular expressions which are an order of magnitude slower (and do not add any benefits, in my opinion not even readability, in this specific case).

Here is the code I've ended up using in my string extensions library (it's not a new answer to this question, since others have already posted semantically identical code here, I'm not taking credit for it). This is the fastest method (even, possibly, including unsafe variations - more on that later):

public static int IndexOfNth(this string str, char ch, int nth, int startIndex = 0) {
    if (str == null)
        throw new ArgumentNullException("str");
    var idx = str.IndexOf(ch, startIndex);
    while (idx >= 0 && --nth > 0)
        idx = str.IndexOf(ch, startIndex + idx + 1);
    return idx;
}

I've benchmarked this code against two other methods and the results follow:

Benchmark results

The benchmarked methods were:

[Benchmark]
public int FindNthRegex() {
    Match m = Regex.Match(text, "((" + Regex.Escape("z") + ").*?){" + Nth + "}");
    return (m.Success)
        ? m.Groups[2].Captures[Nth - 1].Index
        : -1;
}
[Benchmark]
public int FindNthCharByChar() {
    var occurrence = 0;
    for (int i = 0; i < text.Length; i++) {
        if (text[i] == 'z')
            occurrence++;
        if (Nth == occurrence)
            return i;
    }
    return -1;
}
[Benchmark]
public int FindNthIndexOfStartIdx() {
    var idx = text.IndexOf('z', 0);
    var nth = Nth;
    while (idx >= 0 && --nth > 0)
        idx = text.IndexOf('z', idx + 1);
    return idx;
}

The FindNthRegex method is the slower of the bunch, taking an order (or two) of magnitude more time than the fastest. FindNthByChar loops over each char on the string and counts each match until it finds the nth occurrence. FindNthIndexOfStartIdx uses the method suggested by the opener of this question which, indeed, is the same I've been using for ages to accomplish this and it is the fastest of them all.

Why is it so much faster than FindNthByChar? It's because Microsoft went to great lengths to make string manipulation as fast as possible in the dotnet framework. And they've accomplished that! They did an amazing job! I've done a deeper investigation on string manipulations in dotnet in an CodeProject article which tries to find the fastest method to remove all whitespace from a string:

Fastest method to remove all whitespace from Strings in .NET

There you'll find why string manipulations in dotnet are so fast, and why it's next to useless trying to squeeze more speed by writing our own versions of the framework's string manipulation code (the likes of string.IndexOf, string.Split, string.Replace, etc.)

The full benchmark code I've used follows (it's a dotnet6 console program):

UPDATE: Added two methods FindNthCharByCharInSpan and FindNthCharRecursive (and now FindNthByLinq).

using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
using System.Text;
using System.Text.RegularExpressions;

var summary = BenchmarkRunner.Run<BenchmarkFindNthChar>();

public class BenchmarkFindNthChar
{
    const string BaseText = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789";

    [Params(100, 1000)]
    public int BaseTextRepeatCount { get; set; }
    [Params(500)]
    public int Nth { get; set; }
    private string text;
    [GlobalSetup]
    public void BuildTestData() {
        var sb = new StringBuilder();
        for (int i = 0; i < BaseTextRepeatCount; i++)
            sb.AppendLine(BaseText);
        text = sb.ToString();
    }
    [Benchmark]
    public int FindNthRegex() {
        Match m = Regex.Match(text, "((" + Regex.Escape("z") + ").*?){" + Nth + "}");
        return (m.Success)
            ? m.Groups[2].Captures[Nth - 1].Index
            : -1;
    }
    [Benchmark]
    public int FindNthCharByChar() {
        var occurrence = 0;
        for (int i = 0; i < text.Length; i++) {
            if (text[i] == 'z')
                occurrence++;
            if (Nth == occurrence)
                return i;
        }
        return -1;
    }
    [Benchmark]
    public int FindNthIndexOfStartIdx() {
        var idx = text.IndexOf('z', 0);
        var nth = Nth;
        while (idx >= 0 && --nth > 0)
            idx = text.IndexOf('z', idx + 1);
        return idx;
    }

    [Benchmark]
    public int FindNthCharByCharInSpan() {
        var span = text.AsSpan();   
        var occurrence = 0;
        for (int i = 0; i < span.Length; i++) {
            if (span[i] == 'z')
                occurrence++;
            if (Nth == occurrence)
                return i;
        }
        return -1;
    }
    [Benchmark]
    public int FindNthCharRecursive() => IndexOfNth(text, "z", 0, Nth);
    public static int IndexOfNth(string input, string value, int startIndex, int nth) {
        if (nth == 1)
            return input.IndexOf(value, startIndex);
        var idx = input.IndexOf(value, startIndex);
        if (idx == -1)
            return -1;
        return IndexOfNth(input, value, idx + 1, --nth);
    }
    [Benchmark]
    public int FindNthByLinq() {
        var items = text.Select((c, i) => (c, i)).Where(t => t.c == 'z');
        return (items.Count() > Nth - 1)
            ? items.ElementAt(Nth - 1).i
            : -1;
    }    

}

UPDATE 2: The new benchmark results (with Linq-based benchmark) follows:

New benchmark results

The Linq-based solution is only better than the recursive method, but it's good to have it here for completeness.

Ben Jasperson
  • 300
  • 4
  • 17
Loudenvier
  • 8,362
  • 6
  • 45
  • 66
  • 1
    @BenJasperson I was just trying to find the fastest, those are way too slow compared to these. But, your wish is my command... I will add those :-) I'll comment here when it's done. – Loudenvier Jan 24 '23 at 21:49
  • 1
    @BenJasperson added the recursive solution, but didn't manage to figure out any linq based solution worth to add here. – Loudenvier Jan 24 '23 at 23:07
  • 1
    Nice, thanks! For the Linq based solution, I was meaning something like @Matthias's answer: `public static int NthIndexOf(this string haystack, char needle, int n) => haystack.Select((c, i) => new Tuple(c, i)).Where(t => t.Item1 == needle).ElementAt(n).Item2;` But, apparently, I have to be careful what I wish for. I only wish you to add that in if you want to. You're already comparing quite a lot. – Ben Jasperson Jan 26 '23 at 05:56
  • 1
    @BenJasperson I've just added the Linq-based solution. Had to tweak it a little because the first test actually tries to find a char past the end of the text and that would cause and Index out of bounds exception, but it behaves the same otherwise. It's only better than the recursive solution. Thanks for pushing me to add more testes for completeness sake! – Loudenvier Jan 28 '23 at 01:22
1

Maybe it would also be nice to work with the String.Split() Method and check if the requested occurrence is in the array, if you don't need the index, but the value at the index

Gwenc37
  • 2,064
  • 7
  • 18
  • 22
1

Or something like this with the do while loop

 private static int OrdinalIndexOf(string str, string substr, int n)
    {
        int pos = -1;
        do
        {
            pos = str.IndexOf(substr, pos + 1);
        } while (n-- > 0 && pos != -1);
        return pos;
    }
xFreeD
  • 41
  • 4
0

System.ValueTuple ftw:

var index = line.Select((x, i) => (x, i)).Where(x => x.Item1 == '"').ElementAt(5).Item2;

writing a function from that is homework

Matthias
  • 15,919
  • 5
  • 39
  • 84
0

Tod's answer can be simplified somewhat.

using System;

static class MainClass {
    private static int IndexOfNth(this string target, string substring,
                                       int seqNr, int startIdx = 0)
    {
        if (seqNr < 1)
        {
            throw new IndexOutOfRangeException("Parameter 'nth' must be greater than 0.");
        }

        var idx = target.IndexOf(substring, startIdx);

        if (idx < 0 || seqNr == 1) { return idx; }

        return target.IndexOfNth(substring, --seqNr, ++idx); // skip
    }

    static void Main () {
        Console.WriteLine ("abcbcbcd".IndexOfNth("bc", 1));
        Console.WriteLine ("abcbcbcd".IndexOfNth("bc", 2));
        Console.WriteLine ("abcbcbcd".IndexOfNth("bc", 3));
        Console.WriteLine ("abcbcbcd".IndexOfNth("bc", 4));
    }
}

Output

1
3
5
-1
ivvi
  • 5,120
  • 5
  • 36
  • 53
-4

This might do it:

Console.WriteLine(str.IndexOf((@"\")+2)+1);
Linuxios
  • 34,849
  • 13
  • 91
  • 116