why in this simple test the speed of method relates to the order of triggering?

Question

I was doing other experiments until this strange behaviour caught my eye.

code is compiled in x64 release.

if key in 1, the 3rd run of List method cost 40% more time than the first 2. output is

List costs 9312
List costs 9289
Array costs 12730
List costs 11950

if key in 2, the 3rd run of Array method cost 30% more time than the first 2. output is

Array costs 8082
Array costs 8086
List costs 11937
Array costs 12698

You can see the pattern, the complete code is attached following (just compile and run): {the code presented is minimal to run the test. The actually code used to get reliable result is more complicated, I wrapped the method and tested it 100+ times after proper warmed up}

class ListArrayLoop
{
    readonly int[] myArray;
    readonly List<int> myList;
    readonly int totalSessions;

    public ListArrayLoop(int loopRange, int totalSessions)
    {
        myArray = new int[loopRange];
        for (int i = 0; i < myArray.Length; i++)
        {
            myArray[i] = i;
        }
        myList = myArray.ToList();
        this.totalSessions = totalSessions;
    }
    public  void ArraySum()
    {
        var pool = myArray;
        long sum = 0;
        for (int j = 0; j < totalSessions; j++)
        {
            sum += pool.Sum();
        }
    }
    public void ListSum()
    {
        var pool = myList;
        long sum = 0;
        for (int j = 0; j < totalSessions; j++)
        {
            sum += pool.Sum();
        }
    }

}
class Program
{
    static void Main(string[] args)
    {
        Stopwatch sw = new Stopwatch();
        ListArrayLoop test = new ListArrayLoop(10000, 100000);

        string input = Console.ReadLine();


        if (input == "1")
        {
            sw.Start();
            test.ListSum();
            sw.Stop();
            Console.WriteLine("List costs {0}",sw.ElapsedMilliseconds);
            sw.Reset();
            sw.Start();
            test.ListSum();
            sw.Stop();
            Console.WriteLine("List costs {0}", sw.ElapsedMilliseconds);
            sw.Reset();
            sw.Start();
            test.ArraySum();
            sw.Stop();
            Console.WriteLine("Array costs {0}", sw.ElapsedMilliseconds);
            sw.Reset();
            sw.Start();
            test.ListSum();
            sw.Stop();
            Console.WriteLine("List costs {0}", sw.ElapsedMilliseconds);
        }
        else
        {
            sw.Start();
            test.ArraySum();
            sw.Stop();
            Console.WriteLine("Array costs {0}", sw.ElapsedMilliseconds);
            sw.Reset();
            sw.Start();
            test.ArraySum();
            sw.Stop();
            Console.WriteLine("Array costs {0}", sw.ElapsedMilliseconds);
            sw.Reset();
            sw.Start();
            test.ListSum();
            sw.Stop();
            Console.WriteLine("List costs {0}", sw.ElapsedMilliseconds);
            sw.Reset();
            sw.Start();
            test.ArraySum();
            sw.Stop();
            Console.WriteLine("Array costs {0}", sw.ElapsedMilliseconds);
        }

        Console.ReadKey();
    }
}

What pattern? A little (lot) more explanation would be helpful. — Polyfun, Apr 12 '12 at 15:17
What happens if you make the test take 10000 milliseconds instead of 100? — Eric Lippert, Apr 12 '12 at 15:30
@EricLippert the pattern still holds, i updated the results for 10000 ms test — colinfang, Apr 12 '12 at 15:34
That's quite odd; I don't know what is happening there. I would run your program with a profiler and see where the extra time is being spent; I don't see why the fourth test should be any more expensive than the first test. — Eric Lippert, Apr 12 '12 at 15:41
Isn't this just a caching issue? When the 'differing' call is made, it is referencing a different data structure and thus must load that new structure into the cache. This would explain why both the 'differing' call and the 3rd run of the 'same' call are more expensive. Has anyone tested what happens if you make a 4th call after the 3rd call (i.e. array, array, list, array, array)? — docmanhattan, Apr 12 '12 at 15:50
@docmanhattan List costs 7406, List costs 7385, Array costs 9327, List costs 9202, List costs 9238. Running a profiler now. — Scott Chamberlain, Apr 12 '12 at 15:51
@docmanhattan: See my CW "answer." All subsequent calls are slower when mixed. — Austin Salonen, Apr 12 '12 at 16:01

score 6 · Answer 1 · edited May 23 '17 at 12:04

Contrived problems give you contrived answers.

Optimization should be done after code is written and not before. Write your solution the way that is easiest to understand and maintain. Then if the program is not fast enough for your use case, then you use a profiling tool and go back and see where the actual bottleneck is, not where you "think" it is.

Most optimizations people try to do in your situation is spending 6 hours to do something that will decrease the run time by 1 second. Most small programs will not be run enough times to offset the cost you spent trying to "optimize" it.

That being said this is a strange edge case. I modified it a bit and am running it though a profiler but I need to downgrade my VS2010 install so I can get .NET framework source stepping back.

I ran though the profiler using the larger example, I can find no good reason why it would take longer.

score 2 · Answer 2 · answered Apr 19 '12 at 17:09

Your issue is your test. When you are benchmarking code there are a couple of guiding principals you should always follow:

Processor Affinity: Use only a single processor, usually not #1.
Warmup: Always perform the test a small number of times up front.
Duration: Make sure your test duration is at least 500ms.
Average: Average together multiple runs to remove anomalies.
Cleanup: Force the GC to collect allocated objects between tests.
Cooldown: Allow the process to sleep for a short-period of time.

So using these guidelines and rewriting your tests I get the following results:

Run 1

Enter test number (1|2): 1
ListSum averages 776
ListSum averages 753
ArraySum averages 1102
ListSum averages 753
Press any key to continue . . .

Run 2

Enter test number (1|2): 2
ArraySum averages 1155
ArraySum averages 1102
ListSum averages 753
ArraySum averages 1067
Press any key to continue . . .

So here is the final test code used:

static void Main(string[] args)
{
    //We just need a single-thread for this test.
    Process.GetCurrentProcess().ProcessorAffinity = new IntPtr(2);
    System.Threading.Thread.BeginThreadAffinity();

    Console.Write("Enter test number (1|2): ");
    string input = Console.ReadLine();

    //perform the action just a few times to jit the code.
    ListArrayLoop warmup = new ListArrayLoop(10, 10);
    Console.WriteLine("Performing warmup...");
    Test(warmup.ListSum);
    Test(warmup.ArraySum);
    Console.WriteLine("Warmup complete...");
    Console.WriteLine();

    ListArrayLoop test = new ListArrayLoop(10000, 10000);

    if (input == "1")
    {
        Test(test.ListSum);
        Test(test.ListSum);
        Test(test.ArraySum);
        Test(test.ListSum);
    }
    else
    {
        Test(test.ArraySum);
        Test(test.ArraySum);
        Test(test.ListSum);
        Test(test.ArraySum);
    }
}

private static void Test(Action test)
{
    long totalElapsed = 0;
    for (int counter = 10; counter > 0; counter--)
    {
        try
        {
            var sw = Stopwatch.StartNew();
            test();
            totalElapsed += sw.ElapsedMilliseconds;
        }
        finally { }

        GC.Collect(0, GCCollectionMode.Forced);
        GC.WaitForPendingFinalizers();
        //cooldown
        for (int i = 0; i < 100; i++)
            System.Threading.Thread.Sleep(0);
    }
    Console.WriteLine("{0} averages {1}", test.Method.Name, totalElapsed / 10);
}

Note: Some people may debate about the usefulness of the cool-down; However, everyone agrees that even if it's not helpful, it is not harmful. I find that on some tests it can yield a more reliable result; however, in the example above I doubt it makes any difference.

You warmed up both methods before actual testing, Test(warmup.ListSum); Test(warmup.ArraySum); However, my question is about the order of testing ListSum and ArraySum, so the symptom is hidden when u do ur warmup. If you don't do ArrayWarm up when u Input 1, and don't do ListWarm up when u Input 2, You will get same result as i got. — colinfang, Apr 19 '12 at 19:52
@colinfang The warmup is nesessary unless you are trying to benchmark the .NET JIT compilation time. And no I don't get the same results due to the GC.Collect() and averaging of multiple runs. Try it for yourself without the warmup if you like. — csharptest.net, Apr 19 '12 at 20:30
I did, my result is similar to urs when performed both warmup. However, if I input 1, and don't do ArrayWarm up, my result is as following: Enter test number (1|2): 1 Performing warmup... ListSum averages 0 Warmup complete... ListSum averages 942 ListSum averages 934 ArraySum averages 1038 ListSum averages 1132 — colinfang, Apr 19 '12 at 21:24
I am not comparing ListSum to ArraySum. I am comparing the first ListSum to the third ListSum. And the ListSum itself is properly warmed up — colinfang, Apr 19 '12 at 21:26

score 1 · Answer 3 · answered Apr 12 '12 at 15:41

Way too much for a comment so it's CW -- feel free to incorporate and I'll delete this. The given code is a little off to me but the problem is still interesting. If you mix calls, you get poorer performance. This code highlights it:

static void Main(string[] args)
{
    var input = Console.ReadLine();

        var test = new ListArrayLoop(10000, 1000);

        switch (input)
        {
            case "1":
                Test(test.ListSum);
                break;
            case "2":
                Test(test.ArraySum);
                break;
            case "3":
                // adds about 40 ms
                test.ArraySum();
                Test(test.ListSum);
                break;
            default:
                // adds about 35 ms
                test.ListSum();
                Test(test.ArraySum);
                break;
        }

}

private static void Test(Action toTest)
{
    for (int i = 0; i < 100; i++)
    {
        var sw = Stopwatch.StartNew();
        toTest();
        sw.Stop();
        Console.WriteLine("costs {0}", sw.ElapsedMilliseconds);
        sw.Reset();
    }
}

score 1 · Answer 4 · edited May 23 '17 at 12:29

1

Lists are implemented in .NET with arrays so the average performance should be the same (since you do not change the length of either).

It looks like you have averaged the sum()s' times sufficiently, this could be a GC issue with an iterator used in the sum() method.

edited May 23 '17 at 12:29

Community

1
1

answered Apr 25 '12 at 15:37

Danny Varod

17,324
5
69
111

I did GC,array,GC,array,GC,list,GC,array ... actually – colinfang Apr 25 '12 at 17:08
@colinfang Could you run the array array list list array array and then list list array array list list to compare results with? – Danny Varod Apr 25 '12 at 21:44
I tried, it doesn't make any difference. It is also confirmed by comment of Scott Chamberlain under my question post – colinfang Apr 25 '12 at 22:05
@colinfang So it is 8sec 8sec 11sec 11sec 11sec 11sec for both? – Danny Varod Apr 25 '12 at 22:10
it's 8,8,12,12,13,13 and 9,9,13,13,12,12 respectively – colinfang Apr 25 '12 at 23:44
In that case it might be caused by the the break of interface resolution optimization that the CLR performs (as Nuf mentioned). The optimization only works if the interface is only accessed via one type. Apparently the first time you use another class it breaks the optimization and does not attempt to reform it even when the same class is used 10K times in a row. You can check this by summing the array and list manually using the indexer property instead of via LINQ. – Danny Varod Apr 27 '12 at 13:13

score 1 · Accepted Answer · answered Apr 26 '12 at 23:09

Short answer: It is because CRL has optimization for dispatching methods called on interface-type. As long as particular interface's method call is made on the same type (that implements this interface), CLR uses fast dispatching routine (only 3 instructions) that only checks actual type of instance and in case of match it jumps directly on precomputed address of particular method. But when the same interface's method call is made on instance of another type, CLR switches dispatching to slower routine (which can dispatch methods for any actual instance type).

Long answer: Firstly, take a look at how the method System.Linq.Enumerable.Sum() is declared (I omitted validity checking of source parameter because it's not important it this case):

public static int Sum(this IEnumerable<int> source)
{
    int num = 0;
    foreach (int num2 in source)
        num += num2;
    return num;
}

So all types that implement IEnumerable< int > can call this extension method, including int[] and List< int >. Keyword foreach is just abbreviation for getting enumerator via IEnumerable< T >.GetEnumerator() and iterating through all values. So this method actually does this:

    public static int Sum(this IEnumerable<int> source)
    {
        int num = 0;
        IEnumerator<int> Enumerator = source.GetEnumerator();
        while(Enumerator.MoveNext())
            num += Enumerator.Current;
        return num;
    }

Now you can clearly see, that method body contains three method calls on interface-type variables: GetEnumerator(), MoveNext(), and Current (although Current is actually property, not method, reading value from property just calls corresponding getter method).

GetEnumerator() typically creates new instance of some auxiliary class, which implements IEnumerator< T > and thus is able to return all values one by one. It is important to note, that in case of int[] and List< int >, types of enumerators returned by GetEnumerator() ot these two classes are different. If argument source is of type int[], then GetEnumerator() returns instance of type SZGenericArrayEnumerator< int > and if source is of type List< int >, then it returns instance of type List< int >+Enumerator< int >.

Two other methods (MoveNext() and Current) are repeatedly called in tight loop and therefore their speed is crucial for overall performance. Unfortunatelly calling method on interface-type variable (such as IEnumerator< int >) is not as straightforward as ordinary instance method call. CLR must dynamically find out actual type of object in variable and then find out, which object's method implements corresponding interface method.

CLR tries to avoid doing this time consuming lookup on every call with a little trick. When particular method (such as MoveNext()) is called for the first time, CLR finds actual type of instance on which this call is made (for example SZGenericArrayEnumerator< int > in case you called Sum on int[]) and finds address of method, that implements corresponding method for this type (that is address of method SZGenericArrayEnumerator< int >.MoveNext()). Then it uses this information to generate auxiliary dispatching method, which simply checks, whether actual instance type is the same as when first call was made (that is SZGenericArrayEnumerator< int >) and if it is, it directly jumps to the method's address found earlier. So on subsequent calls, no complicated method lookup is made as long as type of instance remains the same. But when call is made on enumerator of different type (such as List< int >+Enumerator< int > in case of calculating sum of List< int >), CLR no longer uses this fast dispatching method. Instead another (general-purpose) and much slower dispatching method is used.

So as long as Sum() is called on array only, CLR dispatches calls to GetEnumerator(), MoveNext(), and Current using fast method. When Sum() is called on list too, CLR switches to slower dispatching method and therefore performance decreases.

If performance is your concern, implement your own separate Sum() extension method for every type, on which you want to call Sum(). This ensures that CLR will use fast dispatching method. For example:

public static class FasterSumExtensions
{
    public static int Sum(this int[] source)
    {
        int num = 0;
        foreach (int num2 in source)
            num += num2;
        return num;
    }

    public static int Sum(this List<int> source)
    {
        int num = 0;
        foreach(int num2 in source)
            num += num2;
        return num;
    }
}

Or even better, avoid using IEnumerable< T > interface at all (because it's still brings noticeable overhead). For example:

public static class EvenFasterSumExtensions
{
    public static int Sum(this int[] source)
    {
        int num = 0;
        for(int i = 0; i < source.Length; i++)
            num += source[i];
        return num;
    }

    public static int Sum(this List<int> source)
    {
        int num = 0;
        for(int i = 0; i < source.Count; i++)
            num += source[i];
        return num;
    }
}

Here are results from my computer:

Your original program: 9844, 9841, 12545, 14384
FasterSumExtensions: 6149, 6445, 754, 6145
EvenFasterSumExtensions: 1557, 1561, 553, 1574

Does it does harm to the speed of myArray.Sum() if I run myList.ToArray() before hand? even in different scope? — colinfang, Sep 14 '12 at 16:18

score 0 · Answer 6 · answered Apr 24 '12 at 15:30

Hm, it really looks strange... My guess: You are calling .sum() on the variable pool with var type. As long as you are working on only one type (either list or array), the call to sum() is unambigous and can be optimized. By using a new class var is ambigous and must be resolved, so further calls will cause a performance hit. I do not have a compiler, so try to load another class which supports sum() and compare the times. If I am right, I would expect again a performance hit, but this time not so much.

a. there is no "var" in his code. b. "var" is resolved at compile time therefore it has absolutely no effect at runtime. — Danny Varod, Apr 25 '12 at 15:39

score 0 · Answer 7 · answered Apr 25 '12 at 12:32

In my opinion that's caching (because of read ahead). First time you access an array, many elements from it get into the cache at once (read ahead). This prefetching mechanism expects the program will be likely accessing memory near theaddress requested.

Further calls already benefit from this (provided the array fit into the cache). When you change the method, the cache is invalidated and you need to get everything again from memory.

so calling: list, array, list, array, list, array should be slower than: list,list,list,array,array,array

But this is not deterministic from the programmer point of view, as you don't know the state of the cache or other units affecting caching decisions.

why in this simple test the speed of method relates to the order of triggering?

7 Answers7