Why function call takes so much time?

Question

I have the following insertion sort algorithm implementation:

private static void InsertionSort(List<int> array)
{
    for (int i = 1; i < array.Count; ++i)
    {
        for (int j = i; j > 0 && array[j - 1] > array[j]; --j)
        {
             //swap(array, j, j-1);

             int temp = array[j];
             array[j] = array[j-1];
             array[j-1] = temp;
        }
     }
}

private static void swap(List<int> array, int i, int j)
{
    int temp = array[i];
    array[i] = array[j];
    array[j] = temp;
}

When I run my algorithm with swap(array, j, j-1); it takes much more time (+2 seconds for 50000 elements) than if I do function body inline.

Why?

I assume that is release build? (Relevant links: http://stackoverflow.com/q/15713910/11683, http://stackoverflow.com/q/473782/11683) — GSerg, Jun 04 '16 at 15:54
Don't you need to say "ref" on the array argument to "swap"? — Mike Dunlavey, Jun 04 '16 at 16:13
How are you measuring times? post all code. Also check this: http://www.dotnetperls.com/optimization — Juan Carlos, Jun 04 '16 at 16:35

score 15 · Accepted Answer · edited May 23 '17 at 12:22

It isn't wrong to inline the method by hand, it just isn't necessary. Inlining small methods is one of the standard optimizations performed by the jitter. That doesn't always happen, but on .NET 4.6.1 both the x86 and the x64 jitters do inline swap() in this sample code. And more, they also unroll the inner loop to produce two swaps per pass, the kind of hand-optimization programmers usually skip.

Properly benchmarking a .NET app isn't always that straight-forward. Very important to run the Release build of your program. And to not use the debugger. Albeit the latter is easy to fix, use Tools > Options > Debugging > General > untick the "Suppress JIT optimization" option. There is no good reason to turn it back on.

You can now also see the generated machine code, set a breakpoint on InsertionSort() and when it hits use Debug > Windows > Disassembly. Tends to make people's eyes bleed but it is quite easy to see that you get two inlined swaps(). I'll spare you the assembly dump, just have a look-see. And you should clearly see the difference in the measurement. Here's what I get:

Running it 5 times with swap() on a List with 50,000 random integers on x64:

00:00:05.4447216
00:00:05.2928558
00:00:05.6960587
00:00:05.2835343
00:00:05.2809591

Same test but now swap() inlined by hand:

00:00:05.3015856
00:00:05.2877402
00:00:05.6369775
00:00:05.2603384
00:00:05.2616389

Takes just as long, as it should.

I would be remiss to not show the results I get with List.Sort():

00:00:00.0075878
00:00:00.0073398
00:00:00.0076528
00:00:00.0078046
00:00:00.0066319

Oh, I run Debug build of my program! What is the difference? — , Jun 04 '16 at 16:42
List.Sort(). what type of sorting algorithm it use?quick sort? — , Jun 04 '16 at 16:44
Well, you now know what the difference is, those optimizations I linked to are not performed so you do see the cost of the method call. It uses [introspection sort](https://en.wikipedia.org/wiki/Introsort). — Hans Passant, Jun 04 '16 at 16:49
Why does it unroll the loop to do exactly *two* swaps per pass? — Bergi, Jun 04 '16 at 22:05
Exactly how many copies you get of the loop's inner body varies. Most I've ever seen is 4, swap() is a bit beefy in machine code so 2 seems a reasonable choice — Hans Passant, Sep 01 '16 at 12:55

Why function call takes so much time?

1 Answers1