How can I measure cold-code performance?

Question

Suppose I have two methods, Foo and Bar, that do roughly the same thing, and I want to measure which one is faster. Also, single execution of both Foo and Bar is too fast to measure reliably.

Normally, I'd simply run them both a huge number of times like this:

var sw=new Stopwatch();
sw.Start();
for(int ii=0;ii<HugeNumber;++ii)
    Foo();
sw.Stop();
Console.WriteLine("Foo: "+sw.ElapsedMilliseconds);
// and the same code for Bar

But in this way, every run of Foo after the first will probably be working with processor cache, not actual memory. Which is probably way faster than in real application. What can I do to ensure that my method is run cold every time?

Clarification By "roughly the same thing" I mean the both methods are used in the same way, but actual algorithm may differ significantly. For example, Foo might be doing some tricky math, while Bar skips it by using more memory.

And yes, I understand that methods running in cold will not have much effect on overall performance. I'm still interested which one is faster.

What does it matter if it gets cached? They both will, so you'll still know which one is faster, right? — Rob, Dec 01 '11 at 08:29
Or you can start the stopwatch after the first iteration... But that's not really neccessary as Rob pointed out. — Kornelije Petak, Dec 01 '11 at 08:33
@Rob They both will be cached, and both will be faster. But exact gains in performance will not be equal: suppose Foo makes some tricky math, and Bar cuts around it by using more memory. In this case, Bar will gain more performance from being run hot. — Nevermind, Dec 01 '11 at 08:37
@Nevermind quote:`that do roughly the same thing`, nuff said? Not that this answers your original question, but thats why it's a comment :D — Rob, Dec 01 '11 at 08:39
You actually have to call your methods one time before the stopwatch start otherwise you may add JIT to your measure. — Guillaume, Dec 01 '11 at 08:40
I feel get the `SystemTime` in the start and end of method and then calculate the `TimeSpan` between them. It may solve what u want — Sandy, Dec 01 '11 at 08:51
Seems like a premature optimization to me. If you can't run/profile it in the context of a whole program's benchmark, then you're not running it in a realistic scenario anyhow. And if you are running it via a whole program's benchmark, you don't have to rely on highly fabricated scenarios like this. — Merlyn Morgan-Graham, Dec 01 '11 at 09:49
@Merlyn Morgan-Graham I never said anything about optimization - only about measuring. It's more of a curiosity thing. Although, this test can make sense in optimization context, if I, say, had a program with hundreds different calls to Foo that happen every few milliseconds... this way, they run cold, have some slight effect on overall performance, and whole-program measure is not desirable. — Nevermind, Dec 01 '11 at 11:58

score 1 · Accepted Answer · edited May 23 '17 at 12:03

First of all if Foo is working with the processor cache then Bar will also work with the processor cache. Shouldn't It ???????? So both of your functions are getting the same previledge. Now suppose the after first time the time for foo is A and then it is running with avg time B as it is working with processor cache. So total time will be

A + B*(hugenumber-1)

Similarly for Bar it will be

C + D*(hugenumber-1) //where C is the first runtime and D is the avg runtime using prscr cache

If i am not wrong here the result is depended on B and D and both of them are average runtime using the processor cache. So if you want to calculate which of your function is better I thing processor cache is not a problem as both functions are suppose to use that.

Edited:

I think now its clear. As Bar is skipping some tricky maths by using memory it will have a little bit (may be in nano/pico seconds) advantage. So in order to restrict that you have to flush your cpu cache inside your for loop. As in both the loops you will be doing the same thing I think now you will get a better idea about which function is better. There is already a stack overflow discussion on how to flush cpu cache. Please vist this link hope it helps.

Edit details: Improved answer and corrected spellings

I'm interested specificallly in A in C in your example, not B and D. — Nevermind, Dec 01 '11 at 08:45
That link is actually helpful, although i was hoping for a simpler solution... at least, staying on the "managed" side of things (-8 — Nevermind, Dec 01 '11 at 09:06

score 0 · Answer 2 · answered Dec 01 '11 at 08:35

But assuming Foo and Bar are similar enough, any cache speedup (or any other environmental factor) should affect both equally. So even though you might not be getting an accurate absolute measure of cold performance, you should still observe a relative difference between the algorithms if one exists.

Also remember that if these functions are called in the inner loop of your system (otherwise why would you care so much about their performance), in the real world they're likely to be kept in the cache anyway, so by using your code you're likely to get a decent approximation of real world performance.

How can I measure cold-code performance?

2 Answers2