Recommended samples for performance benchmarks?

Question

I'm writing performance benchmarks for some of my code. This is both to compare my own implementations as I develop/experiment, and to compare against "competing" implementations. I have no problem writing these, and getting usable results.

It's very well established that more samples are a good thing, as it reduces the impact of erroneous data and gives a more true result.

So, if I'm profiling a given function/procedure/whatever, how many samples does it seem reasonable to get?

I'm currently doing about 1 million samples for each test. These are individual operations, the results rarely take longer than 10s per item, even on an old laptop. Most are under a hundredth of a second.

score 0 · Answer 1 · edited May 23 '17 at 11:54

Actually, it is not well established that more samples are a good thing. It is nothing more than common wisdom.

I think you are sharing in a general confusion about the reason for profiling, whether the purpose is to measure performance or to find speedups.

For measuring performance, you don't need samples at all. what you need is a stopwatch, whether in software or not. If your process runs too quickly for the resolution of the stopwatch, just run your process 10^3 or 10^6 times, measure it, and divide by that number.

For finding speedups, sampling the call stack is very effective, provided the samples contain line-level or instruction-level call site information. How many samples do you need? Well, if you see it doing something that could be removed on one sample, that probably doesn't mean much. But if you see it on two samples, that estimates it's costing time fraction F of about 2/N where N is the number of samples. Example: if you see it twice in 10 samples, that means it costs roughly 20% of time. In general, if the speedup is going to save you fraction F of time, it takes on average 2/F samples to see it twice. Example: if it is going to save 30% of time (F = 0.3) you need on average 2/0.3 = 6.67 samples to see it twice. Of course, if you see it more than twice, all the better.

Bottom line, for finding speedups, you don't need a lot of samples. What you do need is to examine each one for activity that could be removed. What you don't need is to mush them together into "statistics" (like most profilers do). Many people understand this.

If you want a bit more rigorous explanation, look here.

As far as execution time goes, each sample was a timing from an in-code stopwatch. Sorry if I made this seem to not be the case. Thanks for the clarification. It seems I was largely coming at this as a statistician; I was not aware of field-specific nuances. — Patrick Kelly, Jun 21 '15 at 16:41

Recommended samples for performance benchmarks?

1 Answers1