Why doesn't String.Contains call the final overload directly?

Question

The String.Contains method looks like this internally

public bool Contains(string value)
{
   return this.IndexOf(value, StringComparison.Ordinal) >= 0;
}

The IndexOf overload that is called looks like this

public int IndexOf(string value, StringComparison comparisonType)
{
   return this.IndexOf(value, 0, this.Length, comparisonType);
}

Here another call is made to the final overload, which then calls the relevant CompareInfo.IndexOf method, with the signature

public int IndexOf(string value, int startIndex, int count, StringComparison comparisonType)

Therefore, calling the final overload would be the fastest (albeit may be considered a micro optimization in most cases).

I may be missing something obvious but why does the Contains method not call the final overload directly considering that no other work is done in the intermediate call and that the same information is available at both stages?

Is the only advantage that if the signature of the final overload changes, only one change needs to be made (that of the intermediate method), or is there more to the design than that?

Edit from the comments (see update 2 for speed difference explanation)

To clarify the performance differences I'm getting in case I've made a mistake somewhere: I ran this benchmark (looped 5 times to avoid jitter bias) and used this extension method to compare against the String.Contains method

public static bool QuickContains(this string input, string value)
{
   return input.IndexOf(value, 0, input.Length, StringComparison.OrdinalIgnoreCase) >= 0;
}

with the loop looking like this

for (int i = 0; i < 1000000; i++)
{
   bool containsStringRegEx = testString.QuickContains("STRING");
}
sw.Stop();
Console.WriteLine("QuickContains: " + sw.ElapsedMilliseconds);

In the benchmark test, QuickContains seems about 50% faster than String.Contains on my machine.

Update 2 (performance difference explained)

I've spotted something unfair in the benchmark which explains a lot. The benchmark itself was to measure case-insensitive strings but since String.Contains can only perform case-sensitive searches, the ToUpper method was included. This would skew the results, not in terms of final output, but at least in terms of simply measuring String.Contains performance in non case-sensitive searches.

So now, if I use this extension method

public static bool QuickContains(this string input, string value)
{
   return input.IndexOf(value, 0, input.Length, StringComparison.Ordinal) >= 0;
}

use StringComparison.Ordinal in the 2 overload IndexOf call and remove ToUpper, the QuickContains method actually becomes the slowest. IndexOf and Contains are pretty much on par in terms of performance. So clearly it was the ToUpper call skewing the results of why there was such a discrepancy between Contains and IndexOf.

Not sure why the QuickContains extension method has become the slowest. (Possibly related to the fact that Contains has the [__DynamicallyInvokable, TargetedPatchingOptOut("Performance critical to inline across NGen image boundaries")] attribute?).

Question still remains as to why the 4 overload method isn't called directly but it seems performance isn't impacted (as Adrian and delnan pointed out in the comments) by the decision.

It's possible that the method call will be inlined by the compiler, so any occurrences of `String.Contains(string value)` may well already have be rewritten into the more complex version by the time the code comes to run (leaving the only reason being the advantage you mention, with no disadvantage). However, this is obviously just conjecture. — Adrian Wragg, Jul 10 '13 at 22:01
That's a good thought. I just tried benchmarking it by creating a custom `Contains` method that calls the final overload directly and it ended up being 50% faster than the existing `Contains` method so there is a noticeable performance hit (albeit a small one in absolute terms; 200ms over 1m loops). — keyboardP, Jul 10 '13 at 22:17
@keyboardP I find that a bit hard to believe; even the simplest inlining heuristic and the simplest inliner should make the two variants indistinguishable. Are you sure you didn't commit [benchmarking mistakes](http://tech.pro/blog/1293/c-performance-benchmark-mistakes-part-one)? — , Jul 10 '13 at 22:50
@delnan - Thanks for the link, going through it now but the benchmark I used is [this one](http://stackoverflow.com/questions/17579368/regex-ismatch-vs-string-toupper-contains-performance/17579471#17579471) (happy to fix any issues there). I created a new method which simply called the `IndexOf` with four overloads and threw that with the other tests). — keyboardP, Jul 10 '13 at 22:52
@keyboardP One more thing, the second articles doesn't appear to link to the [third](http://tech.pro/tutorial/1317/c-performance-benchmark-mistakes-part-three). You appear to have part one and two covered, code-wise (I can't tell *how* you measured it, i.e. outside VS, compiled in release mode etc.). — , Jul 10 '13 at 22:56
@delnan - I ran in both `Release` and `Debug` mode (both launched from VS) and, what's missing from the code in the link, is that I ran it in a for loop 5 times to avoid the 'first run' issue. AFAICT I've covered the last two points. — keyboardP, Jul 10 '13 at 22:58
@keyboardP Okay, thanks and thumbs up. That is surprising but I'll have to swallow it. — , Jul 10 '13 at 23:00
@delnan - It's certainly possible I've made a mistake somewhere so no need to rule that aspect out completely :D But I was curious as to why `Contains` who so slow compared to `IndexOf` if it simply called it and it seems that there may be something more to it. — keyboardP, Jul 10 '13 at 23:02
@delnan - You were right! I've further updated the question but essentially the benchmark was also taking into account the "ToUpper" method call which skewed the results. `Contain` and `IndexOf` are pretty much identical in terms of performance as expected. — keyboardP, Jul 11 '13 at 00:34

JimmiTh · Accepted Answer · 2013-07-11T20:35:06.217

Been a while (years) since I've looked at assembly, and I know close to nothing about MSIL and JIT, so it would be a nice exercise - couldn't resist, so here's just a bit of, possibly redundant, empirical data. Does the IndexOf overload get inlined?

Here's a tiny Console app:

class Program
{
    static void Main(string[] args)
    {
        "hello".Contains("hell");
    }
}

The JIT generates this in an optimized Release build, Any CPU, running in 32 bit. I've shortened the addresses, and removed some irrelevant lines:

--- ...\Program.cs 
            "hello".Contains("hell");
[snip]
17  mov         ecx,dword ptr ds:[0320219Ch] ; pointer to "hello"
1d  mov         edx,dword ptr ds:[032021A0h] ; pointer to "hell"
23  cmp         dword ptr [ecx],ecx 
25  call        680A6A6C                     ; String.Contains()
[snip]

The call at 0x00000025 goes here:

String.Contains

00  push        0                 ; startIndex = 0
02  push        dword ptr [ecx+4] ; count = this.Length (second DWORD of String)
05  push        4                 ; comparisonType = StringComparison.Ordinal
07  call        FF9655A4          ; String.IndexOf()
0c  test        eax,eax 
0e  setge       al                ; if (... >= 0)
11  movzx       eax,al 
14  ret

Sure enough, it seems to call, directly, the final String.IndexOf overload with four arguments: three pushed; one in edx (value: "hell"); this ("hello") in ecx. To confirm, this is where the call at 0x00000005 goes:

00  push        ebp 
01  mov         ebp,esp 
03  push        edi 
04  push        esi 
05  push        ebx 
06  mov         esi,ecx                  ; this ("hello")
08  mov         edi,edx                  ; value ("hell")
0a  mov         ebx,dword ptr [ebp+10h] 
0d  test        edi,edi                  ; if (value == null)
0f  je          00A374D0 
15  test        ebx,ebx                  ; if (startIndex < 0)
17  jl          00A374FB 
1d  cmp         dword ptr [esi+4],ebx    ; if (startIndex > this.Length)
20  jl          00A374FB 
26  cmp         dword ptr [ebp+0Ch],0    ; if (count < 0)
2a  jl          00A3753F 
[snip]

... which would be the body of:

public int IndexOf(string value, 
                   int startIndex, 
                   int count, 
                   StringComparison comparisonType)
{
  if (value == null)
    throw new ArgumentNullException("value");
  if (startIndex < 0 || startIndex > this.Length)
    throw new ArgumentOutOfRangeException("startIndex",
             Environment.GetResourceString("ArgumentOutOfRange_Index"));
  if (count < 0 || startIndex > this.Length - count)
    throw new ArgumentOutOfRangeException("count",
             Environment.GetResourceString("ArgumentOutOfRange_Count"));
  ...
}

Wow, thanks for effort! It seems that the performance difference is negligible (for non-case sensitive searches) and so it comes down to a design decision. I guess it really might just be a case of having to avoid two code changes should the 4 overload signature ever change (however unlikely). — keyboardP, Jul 11 '13 at 20:31
Of course, right now I realized that the call stack could tell me this. But fun to have my first real look into the JIT output, nonetheless. — JimmiTh, Jul 11 '13 at 20:43
I haven't delved into Assembly to any real degree so your answer has been very interesting to go through. — keyboardP, Jul 11 '13 at 20:47

Why doesn't String.Contains call the final overload directly?

1 Answers1