26

I've noticed that

string1.Length == string2.Length && string1 == string2

on large strings is slightly faster than just

string1 == string2

Is this true? And is this a good practice to compare large string lengths before comparing actual strings?

CoolCodeBro
  • 769
  • 3
  • 9
  • 14
  • 1
    It's not hard to test it, all you need to do is to write it on Console and to use a StopWatch although this is not the most accurate but it enough good for this one. I will do the check and will answer you. – Misha Zaslavsky Oct 17 '13 at 20:22
  • 2
    How did you *notice* that? Any facts to backup this notice? Any sample test proving it? – Darin Dimitrov Oct 17 '13 at 20:22
  • 3
    @MishaZaslavsky It is amazingly hard to test, if by "testing" you also mean "generating a meaningful test input". – Sergey Kalinichenko Oct 17 '13 at 20:23
  • 9
    How do you know `string1 == string2` doesn't check the length first? – Conrad Frix Oct 17 '13 at 20:24
  • 2
    Unless you have a benchmark that indicates one is faster than the other **with your specific data**, the proper answer here is "Use whichever is more readable". – Ken White Oct 17 '13 at 20:24
  • 1
    @KenWhite Exactly. I'd disambiguate that statement by saying "which would be the second snippet" :) – Sergey Kalinichenko Oct 17 '13 at 20:25
  • Used a stopwatch and looped a million times each test. On average it was 5 ms when just comparing two strings vs 3 ms when checking lengths of strings first. Of course, the difference is not that big, but still. – CoolCodeBro Oct 17 '13 at 20:26
  • 2
    @ConradFrix It actually does ;) – Ralf Oct 17 '13 at 20:26
  • As an aside you need to be very careful when using `==` on a string see [Are string.Equals() and == operator really same?](http://stackoverflow.com/q/3678792/119477) Null values, empty strings, string interning, diacritical marks, and case sensitivity all cause problems. – Conrad Frix Oct 17 '13 at 20:37
  • 1
    I work in a 1e6-line c# app. When people wonder about performance, they wonder about things like string-compare, but the actual performance problems we have are *never* in stuff like string-compare. They are in stuff like a) reading resource files to get text to display to users who are wondering what's taking so long :), b) painting windows over and over because the "paint" handler gets added repeatedly but never taken away, c) adding data cells to a worksheet one by one when they could be batched up and done as a group. In other words, I recommend finding out what's *actually* a problem. – Mike Dunlavey Oct 17 '13 at 20:47

9 Answers9

25

strings operator equals does the length check before comparing the chars. So you do not save the comparison of the contents with this trick. You might still save a few CPU cycles because your length check assumes that the strings are not null, while the BCL must check that. So if the lengths are not equal most of the time, you will short-circuit a few instructions.

I might just be wrong here, though. Maybe the operator gets inlined and the checks optimized out. Who knows for sure? (He who measures.)

If you care about saving every cycle you can you should probably use a different strategy in the first place. Maybe managed code is not even the right choice. Given that, I recommend to use the shorter form and not use the additional check.

usr
  • 168,620
  • 35
  • 240
  • 369
  • 4
    `So if the lengths are not equal most of the time, you will short-circuit a few instructions.` And now it's a statistics problem – jamesSampica Oct 17 '13 at 20:37
  • Yes this is most likely why I've noticed the very slight performance difference when looping a million times, it just skipped the null check for both values, saving a few cycles. The difference is too insignificant to put an extra line there. – CoolCodeBro Oct 17 '13 at 21:11
  • According to the reference source, the `string` operator overload `==` calls `String.Equals()`, which only raises more questions about the observed difference in performance... – Softerware Apr 12 '17 at 18:38
  • Presumably you also avoid the overhead of a method call when using `==` on `String.Length`s. – NetMage Jun 23 '21 at 19:25
19

String.Equality Operator or == internally calls string.Equals, so use string.Equals or == provided by the framework. It is already optimized enough.

It first compare references, then length and then actual characters.

You can find the source code here

Code: (Source: http://www.dotnetframework.org/default.aspx/4@0/4@0/DEVDIV_TFS/Dev10/Releases/RTMRel/ndp/clr/src/BCL/System/String@cs/1305376/String@cs)

// Determines whether two strings match.
[ReliabilityContract(Consistency.WillNotCorruptState, Cer.MayFail)]
public override bool Equals(Object obj) {
    if (this == null)                        //this is necessary to guard against reverse-pinvokes and
        throw new NullReferenceException();  //other callers who do not use the callvirt instruction

    String str = obj as String;
    if (str == null)
        return false;

    if (Object.ReferenceEquals(this, obj))
        return true;

    return EqualsHelper(this, str);
}

and

[System.Security.SecuritySafeCritical]  // auto-generated
[ReliabilityContract(Consistency.WillNotCorruptState, Cer.MayFail)]
private unsafe static bool EqualsHelper(String strA, String strB)
{
    Contract.Requires(strA != null);
    Contract.Requires(strB != null);
    int length = strA.Length;
    if (length != strB.Length) return false;

    fixed (char* ap = &strA.m_firstChar) fixed (char* bp = &strB.m_firstChar)
    {
        char* a = ap;
        char* b = bp;

        // unroll the loop
#if AMD64
        // for AMD64 bit platform we unroll by 12 and
        // check 3 qword at a time. This is less code
        // than the 32 bit case and is shorter
        // pathlength

        while (length >= 12)
        {
            if (*(long*)a     != *(long*)b) break;
            if (*(long*)(a+4) != *(long*)(b+4)) break;
            if (*(long*)(a+8) != *(long*)(b+8)) break;
            a += 12; b += 12; length -= 12;
        }
 #else
        while (length >= 10)
        {
            if (*(int*)a != *(int*)b) break;
            if (*(int*)(a+2) != *(int*)(b+2)) break;
            if (*(int*)(a+4) != *(int*)(b+4)) break;
            if (*(int*)(a+6) != *(int*)(b+6)) break;
            if (*(int*)(a+8) != *(int*)(b+8)) break;
            a += 10; b += 10; length -= 10;
        }
  #endif

        // This depends on the fact that the String objects are
        // always zero terminated and that the terminating zero is not included
        // in the length. For odd string sizes, the last compare will include
        // the zero terminator.
        while (length > 0)
        {
            if (*(int*)a != *(int*)b) break;
            a += 2; b += 2; length -= 2;
        }

        return (length <= 0);
    }
}
Jason Haley
  • 3,770
  • 18
  • 22
Habib
  • 219,104
  • 29
  • 407
  • 436
  • It just compares references unless you pass special options, anyways. That;s because of the string intern pool. – It'sNotALie. Oct 17 '13 at 20:27
  • 1
    +1: `==` operator uses `int i = strA.Length; if (i != strB.Length) { return false; }` – Tim Schmelter Oct 17 '13 at 20:27
  • No, it compares values. Unless you cast the strings to objects first and use `==`. – Tim S. Oct 17 '13 at 20:27
  • 2
    Why the downvotes? Scroll down to EqualsHelpers to see that that is what's actually going on! – Sergey Kalinichenko Oct 17 '13 at 20:28
  • 5
    @It'sNotALie No, `==`compares the reference AND value. C# is not Java – jamesSampica Oct 17 '13 at 20:29
  • None of this code is even reached if the lengths aren't equal. – Ben Voigt Oct 17 '13 at 20:33
  • BTW, posting that code is probably a copyright violation. I've been careful not to sign up to view shared source, even though it would be available to me, because I don't want there ever to be a question of creating a derivative. – Ben Voigt Oct 17 '13 at 20:35
  • @Shoe It doesn't on .NET 4.5 (for me, anyways) – It'sNotALie. Oct 17 '13 at 20:42
  • @BenVoigt, I am not sure about the copy right, If I check the link then Microsoft has its copyright, I have also specified the source, if there is an issue with the copyright, I think Stackoverflow can remove the code. – Habib Oct 17 '13 at 20:45
9

For the geeks among us, here's a page which does a great job at benchmarking numerous ways to compare strings.

In a nutshell, the fastest method appears to be the CompareOrdinal:

if (string.CompareOrdinal(stringsWeWantToSeeIfMatches[x], stringsWeAreComparingAgainst[x]) == 0)
{
//they're equal
}

The second best way seems to be using either a Dictionary or Hashset with the "key" as the string you want to compare.

Makes for an interesting read.

Free Coder 24
  • 933
  • 9
  • 12
7

My test results

Compare 10000 strings to 10000 other strings all the same length (256)

Time (s1 == s2): 32536889 ticks

Time (s1.Length == s2.Length) && (s1 == s2): 37380529 ticks

Compare 10000 strings to 10000 other strings Random length max 256

Time (s1 == s2): 27223517 ticks

Time (s1.Length == s2.Length) && (s1 == s2): 23419529 ticks

Compare 10000 strings to 10000 other strings Random length min 256 max 512

Time (s1 == s2): 28904898 ticks

Time (s1.Length == s2.Length) && (s1 == s2): 25442710 ticks

What I find mind boggling is that a compare of 10000 equal length strings will take longer than comparing the same amount of data that is larger.

All these test have been done with exactly the same data.

user2888973
  • 583
  • 3
  • 15
  • 1
    I'm a little late, but the reason that the equal length strings take longer is because the == operator first checks if they are null then checks if they are equal length before checking if the contents are identical. If the lengths are the same, it proceeds; if they are different, it stops there. – soxroxr Feb 16 '18 at 19:18
4

According ILSpy, the string == operator is defined as:

public static bool operator ==(string a, string b)
{
    return string.Equals(a, b);
}

Which is defined as

public static bool Equals(string a, string b)
{
    return a == b || (a != null && b != null && a.Length == b.Length && string.EqualsHelper(a, b));
}

I assume that first a == b is actually a reference equality check (ILSpy is just rendering it as ==), otherwise this would be an infinitely recursive method.

This means that == already checks the lengths of the strings before actually comparing their characters.

p.s.w.g
  • 146,324
  • 30
  • 291
  • 331
  • 1
    This is what `ILSPy` yields for me(.NET 4, the length check is in `EqualsHelper`): `public static bool Equals(string a, string b) { return a == b || (a != null && b != null && string.EqualsHelper(a, b)); }` – Tim Schmelter Oct 17 '13 at 21:01
  • @TimSchmelter What version of the assembly are you looking at? The posted code is from 4.0.0.0? In 2.0.0.0 I see `return (value != null || this == null) && string.EqualsHelper(this, value);`. – p.s.w.g Oct 17 '13 at 21:06
  • @TimSchmelter That is indeed strange. I can distinctly see calls to `System.String::get_Length()` in the IL. And yes, I see *another* length check inside `EqualsHelper`. – p.s.w.g Oct 17 '13 at 21:19
3

In terminated strings, it makes sense to just start comparing characters, since you can't calculate the string lengths without iterating all characters anyway, and the comparison is likely to early exit.

With length-counted strings, comparing the length should be done first, if you are testing for byte-wise equality. You can't even start accessing character data without retrieving the length, since one could be zero-length.

If you are doing a relational comparison, knowing the lengths are different doesn't tell you if the result should be positive or negative. And in a culture-aware comparison, equal strings do not imply equal lengths. So for both of those you need to just compare data.

If operator==(string, string) simply delegates to a relational comparison, you wouldn't expect that to compare lengths. Checking length before doing the comparison could therefore be a benefit. But it seems like the Framework does start with a length check.

Ben Voigt
  • 277,958
  • 43
  • 419
  • 720
0

I'd say the first one is faster is the result of string1.Length == string2.Length is false. Thanks to Short Circuit Evalution (SCE) the actual comparision between the strings is then not made, which might save you time.

If the strings are equal however, the first one is slower since it will check the length first and then do the same thing as the second one.

See http://msdn.microsoft.com/en-us/library/2a723cdk.aspx for information about the && operator and SCE.

JLe
  • 2,835
  • 3
  • 17
  • 30
0

So as I promised I wrote a short code with a stopwatch - you can copy paste it and try on different strings and see the differences

class Program
{
    static void Main(string[] args)
    {
        string str1 = "put the first value";
        string str2 = "put the second value";
        CompareTwoStringsWithStopWatch(str1, str2); //Print the results.
    }

    private static void CompareTwoStringsWithStopWatch(string str1, string str2)
    {
        Stopwatch stopwatch = new Stopwatch();

        stopwatch.Start();
        for (int i = 0; i < 99999999; i++)
        {
            if (str1.Length == str2.Length && str1 == str2)
            {
                SomeOperation();
            }
        }
        stopwatch.Stop();

        Console.WriteLine("{0}. Time: {1}", "Result for: str1.Length == str2.Length && str1 == str2", stopwatch.Elapsed);
        stopwatch.Reset();

        stopwatch.Start();
        for (int i = 0; i < 99999999; i++)
        {
            if (str1 == str2)
            {
                SomeOperation();
            }
        }
        stopwatch.Stop();

        Console.WriteLine("{0}. Time: {1}", "Result for: str1 == str2", stopwatch.Elapsed);
    }

    private static int SomeOperation()
    {
        var value = 500;
        value += 5;

        return value - 300;
    }
}

My conclusions:

  1. As I checked some strings (short ones and long ones) I saw that all the results are almost the same. So the first if (with the length check) is slower in 2/3.
  2. And you have an Equals method in the Object class, just use it :)
  3. You can try it and give us the results also :)
Misha Zaslavsky
  • 8,414
  • 11
  • 70
  • 116
-1

If you expect the strings to be different in their lenghts in most of the time, you can compare their lenghts AND then compare the strings itself by using string.Compare. I got almost 50% performance improvement by doing this:

if (str1.Length == str2.Length)
{
    if (string.Compare(str1, str2, StringComparison.Ordinal) == 0)
    {
       doSomething()
    }
}

In this case, I expect the strings to be different almost all the time, I think str1.Lenght is way cheaper than comparing the actual strings. If they are equal in size, I compare them.

EDIT: Forget what I said. Just use == and be happy.