38

A consultant came by yesterday and somehow the topic of strings came up. He mentioned that he had noticed that for strings less than a certain length, Contains is actually faster than StartsWith. I had to see it with my own two eyes, so I wrote a little app and sure enough, Contains is faster!

How is this possible?

DateTime start = DateTime.MinValue;
DateTime end = DateTime.MinValue;
string str = "Hello there";

start = DateTime.Now;
for (int i = 0; i < 10000000; i++)
{
    str.Contains("H");
}
end = DateTime.Now;
Console.WriteLine("{0}ms using Contains", end.Subtract(start).Milliseconds);

start = DateTime.Now;
for (int i = 0; i < 10000000; i++)
{
    str.StartsWith("H");
}
end = DateTime.Now;
Console.WriteLine("{0}ms using StartsWith", end.Subtract(start).Milliseconds);

Outputs:

726ms using Contains 
865ms using StartsWith

I've tried it with longer strings too!

John Kugelman
  • 349,597
  • 67
  • 533
  • 578
hackerhasid
  • 11,699
  • 10
  • 42
  • 60
  • 2
    Two things. Try switching the order to see if it affects results. Then, since this is an implementation-specific question, look at the source code, through Reflector if necessary. It is likely `Contains` is more carefully optimized (possibly using native code) because it's used more often. – Matthew Flaschen Jun 25 '10 at 17:30
  • 5
    Micro-optimizations are rarely useful. You're comparing a string of max length of maybe 20 characters or so over 10 million iterations and saving a whopping ~140ms. Try it with longer strings or a more valid use case and see if you get the same numbers. – Chris Jun 25 '10 at 17:33
  • 11
    Your time measurements are flawed. You should be using a Stopwatch object to track the time, not DateTimes. If you are going to use DateTimes, you should at least use end.Subtract(start).TotalMilliseconds – Justin Niessner Jun 25 '10 at 17:35
  • The timing does not seem to change based upon string length. But I'd also ask does this matter? The amount of time these commands are taking is so small, I can't see it impacting an application's performance. And I'd rather see the slower StartsWith option than something else that's trying to do the same thing. – Jeff Siver Jun 25 '10 at 17:52

6 Answers6

30

Try using StopWatch to measure the speed instead of DateTime checking.

Stopwatch vs. using System.DateTime.Now for timing events

I think the key is the following the important parts bolded:

Contains:

This method performs an ordinal (case-sensitive and culture-insensitive) comparison.

StartsWith:

This method performs a word (case-sensitive and culture-sensitive) comparison using the current culture.

I think the key is the ordinal comparison which amounts to:

An ordinal sort compares strings based on the numeric value of each Char object in the string. An ordinal comparison is automatically case-sensitive because the lowercase and uppercase versions of a character have different code points. However, if case is not important in your application, you can specify an ordinal comparison that ignores case. This is equivalent to converting the string to uppercase using the invariant culture and then performing an ordinal comparison on the result.

References:

http://msdn.microsoft.com/en-us/library/system.string.aspx

http://msdn.microsoft.com/en-us/library/dy85x1sa.aspx

http://msdn.microsoft.com/en-us/library/baketfxw.aspx

Using Reflector you can see the code for the two:

public bool Contains(string value)
{
    return (this.IndexOf(value, StringComparison.Ordinal) >= 0);
}

public bool StartsWith(string value, bool ignoreCase, CultureInfo culture)
{
    if (value == null)
    {
        throw new ArgumentNullException("value");
    }
    if (this == value)
    {
        return true;
    }
    CultureInfo info = (culture == null) ? CultureInfo.CurrentCulture : culture;
    return info.CompareInfo.IsPrefix(this, value,
        ignoreCase ? CompareOptions.IgnoreCase : CompareOptions.None);
}
Community
  • 1
  • 1
Kelsey
  • 47,246
  • 16
  • 124
  • 162
  • 9
    Yes! This is correct. As Daniel pointed out in another comment, passing StringComparison.Ordinal to StartsWith will make StartsWith much faster than Contains. I just tried it and got "748.3209ms using Contains 154.548ms using StartsWith" – StriplingWarrior Jun 25 '10 at 18:18
  • @StriplingWarrior, Stopwatch is not reliable either with short processes. There will always be variations with each test. Getting 748 vs 154...is not enough evidence! So the question is, how many times you tried your short process test?? – usefulBee Sep 28 '16 at 16:22
  • 1
    @usefulBee: The original question's code repeats the method call ten million times, which puts us into the hundreds of milliseconds. That's usually enough to smooth out the variations when there's no I/O involved. [Here's a LINQPad script](http://share.linqpad.net/k7n66x.linq) that shows similar results in a more robust benchmark test bed. – StriplingWarrior Sep 29 '16 at 15:43
  • 1
    I just ran that Linqpad script: `Contains()`: 1310, `StartsWith()`:1630, `Starts, WithOrdinal`: 205. Yay `Ordinal` – CAD bloke Jul 14 '17 at 02:15
29

I figured it out. It's because StartsWith is culture-sensitive, while Contains is not. That inherently means StartsWith has to do more work.

FWIW, here are my results on Mono with the below (corrected) benchmark:

1988.7906ms using Contains
10174.1019ms using StartsWith

I'd be glad to see people's results on MS, but my main point is that correctly done (and assuming similar optimizations), I think StartsWith has to be slower:

using System;
using System.Diagnostics;

public class ContainsStartsWith
{
    public static void Main()
    {
        string str = "Hello there";

        Stopwatch s = new Stopwatch();
        s.Start();
        for (int i = 0; i < 10000000; i++)
        {
            str.Contains("H");
        }
        s.Stop();
        Console.WriteLine("{0}ms using Contains", s.Elapsed.TotalMilliseconds);

        s.Reset();
        s.Start();
        for (int i = 0; i < 10000000; i++)
        {
            str.StartsWith("H");
        }
        s.Stop();
        Console.WriteLine("{0}ms using StartsWith", s.Elapsed.TotalMilliseconds);

    }
}
Matthew Flaschen
  • 278,309
  • 50
  • 514
  • 539
  • 2
    Really good guess, but likely not. He's not passing in the culture, and this line is in the implementation of StartsWith: `CultureInfo info = (culture == null) ? CultureInfo.CurrentCulture : culture;` – Marc Bollinger Jun 25 '10 at 17:40
  • 2
    @Marc Bollinger - All you've shown there is that StartsWith is culture-sensitive, which is the claim. – Lee Jun 25 '10 at 17:46
  • @Marc, right. It's using the current culture. That's culture-sensitive, and some cultures rely on quite complex normalization rules. – Matthew Flaschen Jun 25 '10 at 17:46
  • 10
    StartsWith uses CurrentCulture by default, which means the comparison has to check for equalities like "æ"=="ae". Contains doesn't do those expensive checks. Pass StringComparison.Ordinal to StartsWith to make it as fast as Contains. – Daniel Jun 25 '10 at 17:47
  • 2
    Why does Microsoft pick different rules for different string methods? It's maddening! – Qwertie Jun 25 '10 at 18:23
10

StartsWith and Contains behave completely different when it comes to culture-sensitive issues.

In particular, StartsWith returning true does NOT imply Contains returning true. You should replace one of them with the other only if you really know what you are doing.

using System;

class Program
{
    static void Main()
    {
        var x = "A";
        var y = "A\u0640";

        Console.WriteLine(x.StartsWith(y)); // True
        Console.WriteLine(x.Contains(y)); // False
    }
}
Zakharia Stanley
  • 1,196
  • 1
  • 9
  • 10
3

I twiddled around in Reflector and found a potential answer:

Contains:

return (this.IndexOf(value, StringComparison.Ordinal) >= 0);

StartsWith:

...
    switch (comparisonType)
    {
        case StringComparison.CurrentCulture:
            return CultureInfo.CurrentCulture.CompareInfo.IsPrefix(this, value, CompareOptions.None);

        case StringComparison.CurrentCultureIgnoreCase:
            return CultureInfo.CurrentCulture.CompareInfo.IsPrefix(this, value, CompareOptions.IgnoreCase);

        case StringComparison.InvariantCulture:
            return CultureInfo.InvariantCulture.CompareInfo.IsPrefix(this, value, CompareOptions.None);

        case StringComparison.InvariantCultureIgnoreCase:
            return CultureInfo.InvariantCulture.CompareInfo.IsPrefix(this, value, CompareOptions.IgnoreCase);

        case StringComparison.Ordinal:
            return ((this.Length >= value.Length) && (nativeCompareOrdinalEx(this, 0, value, 0, value.Length) == 0));

        case StringComparison.OrdinalIgnoreCase:
            return ((this.Length >= value.Length) && (TextInfo.CompareOrdinalIgnoreCaseEx(this, 0, value, 0, value.Length, value.Length) == 0));
    }
    throw new ArgumentException(Environment.GetResourceString("NotSupported_StringComparison"), "comparisonType");

And there are some overloads so that the default culture is CurrentCulture.

So first of all, Ordinal will be faster (if the string is close to the beginning) anyway, right? And secondly, there's more logic here which could slow things down (although so so trivial)

hackerhasid
  • 11,699
  • 10
  • 42
  • 60
1

Here is a benchmark of using StartsWith vs Contains. As you can see, StartsWith using ordinal comparison is pretty good, and you should take note of the memory allocated for each method.

|                                   Method |         Mean |      Error |       StdDev |       Median |     Gen 0 | Gen 1 | Gen 2 | Allocated |
|----------------------------------------- |-------------:|-----------:|-------------:|-------------:|----------:|------:|------:|----------:|
|                         EnumEqualsMethod |  1,079.67 us |  43.707 us |   114.373 us |  1,059.98 us | 1019.5313 |     - |     - | 4800000 B |
|                             EnumEqualsOp |     28.15 us |   0.533 us |     0.547 us |     28.34 us |         - |     - |     - |         - |
|                             ContainsName |  1,572.15 us | 152.347 us |   449.198 us |  1,639.93 us |         - |     - |     - |         - |
|                        ContainsShortName |  1,771.03 us | 103.982 us |   306.592 us |  1,749.32 us |         - |     - |     - |         - |
|                           StartsWithName | 14,511.94 us | 764.825 us | 2,255.103 us | 14,592.07 us |         - |     - |     - |         - |
|                StartsWithNameOrdinalComp |  1,147.03 us |  32.467 us |    93.674 us |  1,153.34 us |         - |     - |     - |         - |
|      StartsWithNameOrdinalCompIgnoreCase |  1,519.30 us | 134.951 us |   397.907 us |  1,264.27 us |         - |     - |     - |         - |
|                      StartsWithShortName |  7,140.82 us |  61.513 us |    51.366 us |  7,133.75 us |         - |     - |     - |       4 B |
|           StartsWithShortNameOrdinalComp |    970.83 us |  68.742 us |   202.686 us |  1,019.14 us |         - |     - |     - |         - |
| StartsWithShortNameOrdinalCompIgnoreCase |    802.22 us |  15.975 us |    32.270 us |    792.46 us |         - |     - |     - |         - |
|      EqualsSubstringOrdinalCompShortName |  4,578.37 us |  91.567 us |   231.402 us |  4,588.09 us |  679.6875 |     - |     - | 3200000 B |
|             EqualsOpShortNametoCharArray |  1,937.55 us |  53.821 us |   145.508 us |  1,901.96 us | 1695.3125 |     - |     - | 8000000 B |

Here is my benchmark code https://gist.github.com/KieranMcCormick/b306c8493084dfc953881a68e0e6d55b

0

Let's examine what ILSpy says about these two...

public virtual int IndexOf(string source, string value, int startIndex, int count, CompareOptions options)
{
    if (source == null)
    {
        throw new ArgumentNullException("source");
    }
    if (value == null)
    {
        throw new ArgumentNullException("value");
    }
    if (startIndex > source.Length)
    {
        throw new ArgumentOutOfRangeException("startIndex", Environment.GetResourceString("ArgumentOutOfRange_Index"));
    }
    if (source.Length == 0)
    {
        if (value.Length == 0)
        {
            return 0;
        }
        return -1;
    }
    else
    {
        if (startIndex < 0)
        {
            throw new ArgumentOutOfRangeException("startIndex", Environment.GetResourceString("ArgumentOutOfRange_Index"));
        }
        if (count < 0 || startIndex > source.Length - count)
        {
            throw new ArgumentOutOfRangeException("count", Environment.GetResourceString("ArgumentOutOfRange_Count"));
        }
        if (options == CompareOptions.OrdinalIgnoreCase)
        {
            return source.IndexOf(value, startIndex, count, StringComparison.OrdinalIgnoreCase);
        }
        if ((options & ~(CompareOptions.IgnoreCase | CompareOptions.IgnoreNonSpace | CompareOptions.IgnoreSymbols | CompareOptions.IgnoreKanaType | CompareOptions.IgnoreWidth)) != CompareOptions.None && options != CompareOptions.Ordinal)
        {
            throw new ArgumentException(Environment.GetResourceString("Argument_InvalidFlag"), "options");
        }
        return CompareInfo.InternalFindNLSStringEx(this.m_dataHandle, this.m_handleOrigin, this.m_sortName, CompareInfo.GetNativeCompareFlags(options) | 4194304 | ((source.IsAscii() && value.IsAscii()) ? 536870912 : 0), source, count, startIndex, value, value.Length);
    }
}

Looks like it considers culture as well, but is defaulted.

public bool StartsWith(string value, StringComparison comparisonType)
{
    if (value == null)
    {
        throw new ArgumentNullException("value");
    }
    if (comparisonType < StringComparison.CurrentCulture || comparisonType > StringComparison.OrdinalIgnoreCase)
    {
        throw new ArgumentException(Environment.GetResourceString("NotSupported_StringComparison"), "comparisonType");
    }
    if (this == value)
    {
        return true;
    }
    if (value.Length == 0)
    {
        return true;
    }
    switch (comparisonType)
    {
    case StringComparison.CurrentCulture:
        return CultureInfo.CurrentCulture.CompareInfo.IsPrefix(this, value, CompareOptions.None);
    case StringComparison.CurrentCultureIgnoreCase:
        return CultureInfo.CurrentCulture.CompareInfo.IsPrefix(this, value, CompareOptions.IgnoreCase);
    case StringComparison.InvariantCulture:
        return CultureInfo.InvariantCulture.CompareInfo.IsPrefix(this, value, CompareOptions.None);
    case StringComparison.InvariantCultureIgnoreCase:
        return CultureInfo.InvariantCulture.CompareInfo.IsPrefix(this, value, CompareOptions.IgnoreCase);
    case StringComparison.Ordinal:
        return this.Length >= value.Length && string.nativeCompareOrdinalEx(this, 0, value, 0, value.Length) == 0;
    case StringComparison.OrdinalIgnoreCase:
        return this.Length >= value.Length && TextInfo.CompareOrdinalIgnoreCaseEx(this, 0, value, 0, value.Length, value.Length) == 0;
    default:
        throw new ArgumentException(Environment.GetResourceString("NotSupported_StringComparison"), "comparisonType");
    }

By contrast, the only difference I see that appears relevant is an extra length check.

Matthew
  • 10,244
  • 5
  • 49
  • 104