0

I am trying to optimize a method.

Here is the original method:

private const string Separator = "::";
private char[] SeparatorChars = Separator.ToCharArray();

public string GetSubName(string name)
{
    if (name.Contains(Separator))
    {
        return name.Split(SeparatorChars)[0];
    }
    
    return "INVALID";
}

if name doesn't contain ::, return INVALID else return the first element in the array by generated by Split with ::.

I have wrote the following optimized method:

public string GetSubNameOpt(string name)
{
    var index = name.IndexOf(Separator, StringComparison.Ordinal);
    if (index >= 0)
    {
        return name.AsSpan().Slice(0, index).ToString();
    }

    return "INVALID";
}

The goal was to omit the second O(N) iteration over the string in order to split it in case it contains the :: substring.

I benchmarked both method using BenchmarkDotNet and here are the results enter image description here

Now.. while, as expected, the Optimized method is better in time and memory in the "name contains :: case" due usage of AsSpan and removing of 1 O(N) iteration of the string, it was surprise to me that the Non Optimized method was better for the INVALID case.

EDIT Running with a non empty, not containing :: case. enter image description here

Again, the Optimized method is slower...

Can you explain what is causing this behavior?

dbc
  • 104,963
  • 20
  • 228
  • 340
NirMH
  • 4,769
  • 3
  • 44
  • 69
  • Which .NET version did you use? – Klaus Gütter Aug 31 '23 at 17:13
  • You know you don't need to use `Span` to get a contiguous substring right? – Ben Voigt Aug 31 '23 at 17:15
  • Also note that the `Split(SeparatorChars)` version is wrong. Try an input of `AB:CDE::FGH`. `Split(Separator)` would be better, except it didn't exist in 4.6.2. – Ben Voigt Aug 31 '23 at 17:17
  • @KlausGütter: net4.6.2 - I've updated my question – NirMH Aug 31 '23 at 17:17
  • @BenVoigt: you are right, my question is not about the span or the general logic... it is why there is a performance difference between Contains and IndexOf ... – NirMH Aug 31 '23 at 17:18
  • Have you tested with a non-empty input (but that also doesn't contain the separator)? – Ben Voigt Aug 31 '23 at 17:21
  • 4
    At least for .NE 4.8 this seems unlikely, as `Contains` is implemented by calling `IndexOf` (see https://referencesource.microsoft.com/#mscorlib/system/string.cs). I doubt that this is different for .NET 4,6,2 – Klaus Gütter Aug 31 '23 at 17:22
  • Also, `Contains` calls `IndexOf`, so the performance difference is an artifact of your testing regimen. https://github.com/microsoft/referencesource/blob/51cf7850defa8a17d815b4700b67116e3fa283c2/mscorlib/system/string.cs#L2172 – Ben Voigt Aug 31 '23 at 17:23
  • @BenVoigt: So you say it is a measurement error? it is a 70% error... are you sure? this sounds too simple root cause – NirMH Aug 31 '23 at 17:25
  • @NirMH: Is your input the empty string? Have you tried testing with a non-empty input that doesn't match? It's easy for measurement noise to be 70% of zero... – Ben Voigt Aug 31 '23 at 17:30
  • @BenVoigt: Tested with input `FGH`. the Opt method runs 46.008ns while the regular 40.016ns. again the Contains is better than IndexOf – NirMH Aug 31 '23 at 17:30
  • @NirMH: Are you aware there are other reasons for one method to be faster than another unrelated to the methods they call internally? – Ben Voigt Aug 31 '23 at 17:32
  • Please show the ildasm output for both functions. I have a suspicion what's causing the "unoptimized" version to run slower. – Ben Voigt Aug 31 '23 at 17:32
  • @BenVoigt: I think this is the help I am seeking - what mechanisms / other reasons can explain this? – NirMH Aug 31 '23 at 17:32
  • How can I show the 'ildasm'? – NirMH Aug 31 '23 at 17:33
  • You're compiling this code into an assembly (DLL or EXE file) right? Run the ildasm tool that comes with .NET on that file. – Ben Voigt Aug 31 '23 at 17:34
  • 1
    But I think the answer is here: https://stackoverflow.com/a/55432110/103167 If the "unoptimized" method gets inlined and the "optimized" one doesn't, then the cost of an extra function call will tip the scales. – Ben Voigt Aug 31 '23 at 17:34
  • 2
    You might try this one as well: `public string GetSubNameOpt2(string name) { var index = name.IndexOf(Separator, StringComparison.Ordinal); if (index >= 0) { return name.Substring(0, index); } return "INVALID"; }` – Ben Voigt Aug 31 '23 at 17:37
  • 1
    Did you measure a Debug build or a Release build? – Klaus Gütter Aug 31 '23 at 18:38
  • It's subtle but I can reproduce this effect, see https://dotnetfiddle.net/ahLC9T. When the separator string is not found and the methods `return "INVALID";`, the unoptimized version is a trifle faster. But if the separator string is found the optimized version is much faster, see https://dotnetfiddle.net/gTUbf7. – dbc Aug 31 '23 at 19:45
  • Incidentally, your unoptimized and optimized methods **return different results** when the `name` string contains a single `:` preceding a double `::`. Given the input string `"a:b::c"` the first method returns `a` while the optimized version returns `a:b`, see https://dotnetfiddle.net/gTUbf7. – dbc Aug 31 '23 at 19:47
  • @dbc: Pointed out much earlier (third comment) – Ben Voigt Aug 31 '23 at 19:51
  • @BenVoigt - ah so it is. Comment thread is so long I missed it. – dbc Aug 31 '23 at 19:52
  • @dbc I get even better results when using `AsSpan().IndexOf` https://dotnetfiddle.net/gTUbf7 – Charlieface Aug 31 '23 at 23:20

1 Answers1

2

Because string.Contains(string) (i.e. without explicitly specified StringComparison) is a very special beast in recent implementations of .NET and looks something like the following:

public bool Contains(string value)
{
    if (value == null)
        ThrowHelper.ThrowArgumentNullException(ExceptionArgument.value);
 
    return SpanHelpers.IndexOf(
        ref _firstChar,
        Length,
        ref value._firstChar,
        value.Length) >= 0;
}

While the string.Contains(string, StringComparison) overload just call IndexOf:

public bool Contains(string value, StringComparison comparisonType)
{
    return IndexOf(value, comparisonType) >= 0;
}

i.e. name.Contains(Separator, StringComparison.Ordinal) will result in "expected" performance.

And IndexOf for ordinal comparison currently uses internal Ordinal.IndexOf which contains quite a lot of checks and different optimizations for more general use-case.

One thing you can try is using AsSpan().IndexOf which on my machine results in better performance for the optimized version:

public string name { get; set; } = "qwerty";
private const string Separator = "::";
private char[] SeparatorChars = Separator.ToCharArray();

[Benchmark]
public string GetSubName()
{
    if (name.Contains(Separator))
    {
        return name.Split(SeparatorChars)[0];
    }
    
    return "INVALID";
}

[Benchmark]
public string GetSubNameOpt()
{
    var index = name.AsSpan().IndexOf(SeparatorChars);
    if (index >= 0)
    {
        return name.AsSpan().Slice(0, index).ToString();
    }
    
    return "INVALID";
}
Method Mean Error StdDev
GetSubName 8.850 ns 0.2388 ns 0.3856 ns
GetSubNameOpt 8.430 ns 0.2337 ns 0.2400 ns
Guru Stron
  • 102,774
  • 10
  • 95
  • 132