Why string.Contains performs better than string.IndexOf?

Question

I am trying to optimize a method.

Here is the original method:

private const string Separator = "::";
private char[] SeparatorChars = Separator.ToCharArray();

public string GetSubName(string name)
{
    if (name.Contains(Separator))
    {
        return name.Split(SeparatorChars)[0];
    }
    
    return "INVALID";
}

if name doesn't contain ::, return INVALID else return the first element in the array by generated by Split with ::.

I have wrote the following optimized method:

public string GetSubNameOpt(string name)
{
    var index = name.IndexOf(Separator, StringComparison.Ordinal);
    if (index >= 0)
    {
        return name.AsSpan().Slice(0, index).ToString();
    }

    return "INVALID";
}

The goal was to omit the second O(N) iteration over the string in order to split it in case it contains the :: substring.

I benchmarked both method using BenchmarkDotNet and here are the results

Now.. while, as expected, the Optimized method is better in time and memory in the "name contains :: case" due usage of AsSpan and removing of 1 O(N) iteration of the string, it was surprise to me that the Non Optimized method was better for the INVALID case.

EDIT Running with a non empty, not containing :: case.

Again, the Optimized method is slower...

Can you explain what is causing this behavior?

You know you don't need to use `Span` to get a contiguous substring right? — Ben Voigt, Aug 31 '23 at 17:15
Also note that the `Split(SeparatorChars)` version is wrong. Try an input of `AB:CDE::FGH`. `Split(Separator)` would be better, except it didn't exist in 4.6.2. — Ben Voigt, Aug 31 '23 at 17:17
@BenVoigt: you are right, my question is not about the span or the general logic... it is why there is a performance difference between Contains and IndexOf ... — NirMH, Aug 31 '23 at 17:18
Have you tested with a non-empty input (but that also doesn't contain the separator)? — Ben Voigt, Aug 31 '23 at 17:21
At least for .NE 4.8 this seems unlikely, as `Contains` is implemented by calling `IndexOf` (see https://referencesource.microsoft.com/#mscorlib/system/string.cs). I doubt that this is different for .NET 4,6,2 — Klaus Gütter, Aug 31 '23 at 17:22
Also, `Contains` calls `IndexOf`, so the performance difference is an artifact of your testing regimen. https://github.com/microsoft/referencesource/blob/51cf7850defa8a17d815b4700b67116e3fa283c2/mscorlib/system/string.cs#L2172 — Ben Voigt, Aug 31 '23 at 17:23
@BenVoigt: So you say it is a measurement error? it is a 70% error... are you sure? this sounds too simple root cause — NirMH, Aug 31 '23 at 17:25
@NirMH: Is your input the empty string? Have you tried testing with a non-empty input that doesn't match? It's easy for measurement noise to be 70% of zero... — Ben Voigt, Aug 31 '23 at 17:30
@BenVoigt: Tested with input `FGH`. the Opt method runs 46.008ns while the regular 40.016ns. again the Contains is better than IndexOf — NirMH, Aug 31 '23 at 17:30
@NirMH: Are you aware there are other reasons for one method to be faster than another unrelated to the methods they call internally? — Ben Voigt, Aug 31 '23 at 17:32
Please show the ildasm output for both functions. I have a suspicion what's causing the "unoptimized" version to run slower. — Ben Voigt, Aug 31 '23 at 17:32
@BenVoigt: I think this is the help I am seeking - what mechanisms / other reasons can explain this? — NirMH, Aug 31 '23 at 17:32
You're compiling this code into an assembly (DLL or EXE file) right? Run the ildasm tool that comes with .NET on that file. — Ben Voigt, Aug 31 '23 at 17:34
But I think the answer is here: https://stackoverflow.com/a/55432110/103167 If the "unoptimized" method gets inlined and the "optimized" one doesn't, then the cost of an extra function call will tip the scales. — Ben Voigt, Aug 31 '23 at 17:34
You might try this one as well: `public string GetSubNameOpt2(string name) { var index = name.IndexOf(Separator, StringComparison.Ordinal); if (index >= 0) { return name.Substring(0, index); } return "INVALID"; }` — Ben Voigt, Aug 31 '23 at 17:37
It's subtle but I can reproduce this effect, see https://dotnetfiddle.net/ahLC9T. When the separator string is not found and the methods `return "INVALID";`, the unoptimized version is a trifle faster. But if the separator string is found the optimized version is much faster, see https://dotnetfiddle.net/gTUbf7. — dbc, Aug 31 '23 at 19:45
Incidentally, your unoptimized and optimized methods **return different results** when the `name` string contains a single `:` preceding a double `::`. Given the input string `"a:b::c"` the first method returns `a` while the optimized version returns `a:b`, see https://dotnetfiddle.net/gTUbf7. — dbc, Aug 31 '23 at 19:47
@BenVoigt - ah so it is. Comment thread is so long I missed it. — dbc, Aug 31 '23 at 19:52
@dbc I get even better results when using `AsSpan().IndexOf` https://dotnetfiddle.net/gTUbf7 — Charlieface, Aug 31 '23 at 23:20

score 2 · Accepted Answer · answered Sep 01 '23 at 06:02

Because string.Contains(string) (i.e. without explicitly specified StringComparison) is a very special beast in recent implementations of .NET and looks something like the following:

public bool Contains(string value)
{
    if (value == null)
        ThrowHelper.ThrowArgumentNullException(ExceptionArgument.value);
 
    return SpanHelpers.IndexOf(
        ref _firstChar,
        Length,
        ref value._firstChar,
        value.Length) >= 0;
}

While the string.Contains(string, StringComparison) overload just call IndexOf:

public bool Contains(string value, StringComparison comparisonType)
{
    return IndexOf(value, comparisonType) >= 0;
}

i.e. name.Contains(Separator, StringComparison.Ordinal) will result in "expected" performance.

And IndexOf for ordinal comparison currently uses internal Ordinal.IndexOf which contains quite a lot of checks and different optimizations for more general use-case.

One thing you can try is using AsSpan().IndexOf which on my machine results in better performance for the optimized version:

public string name { get; set; } = "qwerty";
private const string Separator = "::";
private char[] SeparatorChars = Separator.ToCharArray();

[Benchmark]
public string GetSubName()
{
    if (name.Contains(Separator))
    {
        return name.Split(SeparatorChars)[0];
    }
    
    return "INVALID";
}

[Benchmark]
public string GetSubNameOpt()
{
    var index = name.AsSpan().IndexOf(SeparatorChars);
    if (index >= 0)
    {
        return name.AsSpan().Slice(0, index).ToString();
    }
    
    return "INVALID";
}

Method	Mean	Error	StdDev
GetSubName	8.850 ns	0.2388 ns	0.3856 ns
GetSubNameOpt	8.430 ns	0.2337 ns	0.2400 ns

Why string.Contains performs better than string.IndexOf?

1 Answers1