Performance profiling in .NET

Question

I wrote a class which uses Stopwatch to profile methods and for/foreach loops. With for and foreach loops it tests a standard loop against a Parallel.For or Parallel.ForEach implementation.

You would write performance tests like so:

Method:

PerformanceResult result = Profiler.Execute(() => { FooBar(); });

For loop:

SerialParallelPerformanceResult result = Profiler.For(0, 100, x => { FooBar(x); });

ForEach loop:

SerialParallelPerformanceResult result = Profiler.ForEach(list, item => { FooBar(item); });

Whenever I run the tests (one of .Execute, .For or .ForEach) I put them in a loop so I can see how the performance changes over time.

Example of performance might be:

Method execution 1 = 200ms
Method execution 2 = 12ms
Method execution 3 = 0ms

For execution 1 = 300ms (Serial), 100ms (Parallel)
For execution 2 = 20ms (Serial), 75ms (Parallel)
For execution 3 = 2ms (Serial), 50ms (Parallel)

ForEach execution 1 = 350ms (Serial), 300ms (Parallel)
ForEach execution 2 = 24ms (Serial), 89ms (Parallel)
ForEach execution 3 = 1ms (Serial), 21ms (Parallel)

My questions are:

Why does performance change over time, what is .NET doing in the background to facilitate this?
How/why is a serial operation faster than a parallel one? I have made sure that I make the operations complex to see the difference properly...in most cases serial operations seem faster!?

NOTE: For parallel processing I am testing on an 8 core machine.

Wrong method to measuer timing, and using a micro benchmark which has no real world benefit. Things like that. There is a little art of making a benchmark that actually prooves anything. — TomTom, Sep 27 '12 at 20:02
Ignore the first run, the code is getting JITted. I suspect the slowness of the second run is somehow related to this but I don't know. — Loren Pechtel, Sep 27 '12 at 20:05
In addition to what Loren said, it's possible that the time to execute the work is so small, that all you are measuring is the overhead of what surrounds that work. Also, be aware that Stopwatch measures clock time, not cpu time, so it is affected by other things running during your timing tests. — hatchet - done with SOverflow, Sep 27 '12 at 20:11
@hatchet, the reason I used a Stopwatch was in relation to a previous question I asked, and got this answer: http://stackoverflow.com/a/12618494/1033686. But if you say I need to measure CPU time, does .NET have anything to facilitate this? — Matthew Layton, Sep 27 '12 at 20:23
This question isn't really specific enough to be answered, so I have voted to close as "not constructive". — John Saunders, Sep 27 '12 at 20:27
@activwerx - Stopwatch is ok for doing casual timings and exploration, but you need to do several runs to average out noise, and discard outliers, and not have other things running at the same time that are heavy cpu consumers. You also need to time a substantial enough amount of work that noise and overhead are not more significant than what you are intending to measure. You also should realize the difference between clock time and cpu time, and the implications when comparing single thread execution to parallel. — hatchet - done with SOverflow, Sep 27 '12 at 20:31
There is no guarantee that serial will be faster than parallel. Even the documentation warns to not to assume so. If two teams play each other 5 days in a row do you expect the same team to win every time and for the score to be the same. Computer is doing other stuff. Cache is moving around. — paparazzo, Sep 27 '12 at 20:33
I did some simple tests on my own and found no difference over time. Can you share your code? Ì'll put mine in an answer as it doesn't fit here. — jbarrameda, Jan 27 '15 at 19:02

score 4 · Accepted Answer · answered Sep 28 '12 at 09:36

After some more exploration into performance profiling, I have discovered that using a Stopwatch is not an accurate way to measure the performance of a particular task

(Thanks hatchet and Loren for your comments on this!)

Reasons a stopwatch are not accurate:

Measurements are calculated in elapsed time in milliseconds, not CPU time.
Measurements can be influenced by background "noise" and thread intensive processes.
Measurements do not take into account JIT compilation and overhead.

That being said, using a stopwatch is OK for casual exploration of performance. With that in mind, I have improved my profiling algorithm somewhat.

Where before it simply executed the expression that was passed to it, it now has the facility to iterate over the expression several times, building an average execution time. The first run can be omitted since this is where JIT kicks in, and some major overhead may occur. Understandably, this will never be as sophisticated as using a professional profiling tool like Redgate's ANTS profiler, but it's OK for simpler tasks!

ANTS is certainly a very professional piece of software. Unfortunately, like basically all profilers, problems can easily hide from it, so you are unable to find them. Its primary method of summarization seems to be the "hot stack" which is the same as a "hot path". [*This post*](http://stackoverflow.com/a/25870103/23771) shows how easy it is for problems to hide from those. — Mike Dunlavey, Jan 27 '15 at 22:23

score 2 · Answer 2 · edited Jan 27 '15 at 21:17

As per my comment above: I did some simple tests on my own and found no difference over time. Can you share your code? I'll put mine in an answer as it doesn't fit here.

This is my sample code. (I also tried with both static and instance methods with no difference)

class Program
{
    static void Main(string[] args)
    {
        int to = 50000000;
        OtherStuff os = new OtherStuff();

        Console.WriteLine(Profile(() => os.CountTo(to)));
        Console.WriteLine(Profile(() => os.CountTo(to)));
        Console.WriteLine(Profile(() => os.CountTo(to)));
    }

    static long Profile(Action method)
    {
        Stopwatch st = Stopwatch.StartNew();
        method();
        st.Stop();
        return st.ElapsedMilliseconds;
    }
}

class OtherStuff
{
    public void CountTo(int to)
    {
        for (int i = 0; i < to; i++)
        {
            // some work...
            i++;
            i--;
        }
    }
}

A sample output would be:

331
331
334

Consider executing this method instead:

class OtherStuff
    {
        public string CountTo(Guid id)
        {
            using(SHA256 sha = SHA256.Create())
            {
                int x = default(int);
                for (int index = 0; index < 16; index++)
                {
                    x = id.ToByteArray()[index] >> 32 << 16;
                }
                RNGCryptoServiceProvider rng = new RNGCryptoServiceProvider();
                byte[] y = new byte[1024];
                rng.GetBytes(y);
                y = y.Concat(BitConverter.GetBytes(x)).ToArray();
                return BitConverter.ToString(sha.ComputeHash(BitConverter.GetBytes(x).Where(o => o >> 2 < 0).ToArray()));
            }
        }
    }

Sample output:

11 
0 
0

I think to some extent, this depends on the time/space complexity of the method you are profiling. Here you are demonstrating an O(1) time complexity (The time taken to execute the method is constant, because the method is always doing the same work). Consider running your profiler against a more complex method. (I have added one to the bottom of your answer) — Matthew Layton, Jan 27 '15 at 21:16
Good point on O(n) order (I guess that's what you meant, right?). So it seems in your original question that you are basically measuring overhead. However, I'm still wondering why the third execution is faster than the second. What overhead could be present in the 2nd execution that is not present in 3rd execution? In the example using `Guid`, overhead seems to be present only in 1st execution of the method. — jbarrameda, Jan 28 '15 at 17:54
The first execution is likely down to JIT'ing and other stuff .NET does in the background. The difference in time between the second and third execution was so minuscule that I just put it down to CPU "noise". For really accurate results, and understanding how your code is working, this solution doesn't really offer much benefit over big name profilers, but I still think it's useful just to see how long on average something takes to run — Matthew Layton, Jan 28 '15 at 18:22
I think 20ms in the 2nd run down to 0ms in the 3rd run is a lot for CPU "noise". I would test with maybe up to 20 runs to verify. In any case thank you for this posts. Regarding the big profilers, some times they are not available, maybe for $ reasons or, as in my case, I use this info (the estimated running cost) within the application, and I'm ok with having a rough estimate. — jbarrameda, Jan 29 '15 at 18:36

Performance profiling in .NET

2 Answers2

Linked