How to calculate std dev, quartile, ... with benchmarking a code?

Question

I wrote some functions to benchmark a function/piece of code. I do it like this:

start = timer
for(1 second)
   call fun
   iterations++
stop = timer

And then I have a MEAN (AVERAGE) time: (stop-start/iterations), right?

Single call is too 'short' to measure, so how can I calculate from this type of measurement, std dev, quartile, etc ... ?

You can't. There is no meaningful way (afaik) to define a standard deviation or percentiles for one single measurement. — Carsten, Aug 29 '13 at 15:05
To do std dev, etc, on your call you'd have to be measuring the time per call, which you indicated already is too short to really measure. Then you'd have a collection of time measurements to analyze. The only alternative would be to run your 1 second loop many times and do std dev, etc, on all of the measured averages. But it's not clear that would be very meaningful. — lurker, Aug 29 '13 at 15:08
@Carsten: So how can I do (how should I do) a measurement to be able to get this statistic values? — nullpointer, Aug 29 '13 at 15:09

score 4 · Answer 1 · answered Aug 29 '13 at 15:10

Standard deviation and quartiles both deal with the distribution of values in a group.

With only one measurement, these become trivial or meaningless. Since there's only one measurement, that value is the mean, the minimum, the maximum, and the mode. Since none of the measurements deviate from the mean, the variance and standard deviation are zero.

score 1 · Answer 2 · edited May 23 '17 at 10:25

1

You'll have to find a way to measure the time precisely enough. You'll need the times for individual calls to fun in order to get any meaningful standard deviation etc.

This question may contain useful hints, and I'm sure there are quite a few platform-specific high-resolution timers out there as well.

edited May 23 '17 at 10:25

Community

1
1

answered Aug 29 '13 at 15:13

Hulk

6,399
1
30
52

any ideas for a precise timing? – nullpointer Aug 29 '13 at 15:20
Updated my answer, but I haven't done any high-resolution timing myself recently. I'm not sure about the state of ``s `high_resolution_clock` on windows, for example, perhaps this has improved by now? – Hulk Aug 29 '13 at 15:36
No, doesn't look like it - boost chrono is probably still the better way to go. – Hulk Aug 29 '13 at 15:43
Thanks! I use `getrusage` and `clock_gettime` but they're still not precise enough – nullpointer Aug 29 '13 at 15:43
Huh, I can't use a `` because I write in pure C. Any chances to see a `chrono` source code to see how its implemented? – nullpointer Aug 29 '13 at 15:52
1

Well, I assumed C++ due to the tags in your question. So I take it you are working on Linux. Boost is open source, so you can have a look at their code, but it's usually not very easy to read because of lots of platform/compiler `#ifdef`s. You'll have to research a bit in how to do precision timing in C, perhaps ask a new question with the appropriate tags set - but only after you tried yourself a little ;) – Hulk Aug 29 '13 at 16:00

score 1 · Answer 3 · answered Aug 29 '13 at 20:04

In general, due to the processing speed and the troubles obtaining micro and millisecond resolutions, most performance measurements are based on a large number of iterations.

For example:

Read start time
for 1,000,000 iterations do
perform function
end-for
read end time.

The duration is the end time - start time. The average execution time is the duration divided by the number of iterations.

There are other reasons for using the average time: interruptions by OS, data cache misses and maybe external factors (such as hard drive accesses).

For more exact measurements, you will have to use a "test point" and an oscilliscope. Write high pulse to the test point before the iterations and write a low pulse afterwards. Set the oscilloscope to capture the duration. If your oscilloscope has statistical functions and storage, move the test point writes to before and after the function execution.

score 1 · Answer 4 · answered Aug 29 '13 at 22:17

If a single call is too short to measure, then why do you care how long it takes?

I'm being a bit facetious, but if you're on Intel Linux, and your process is pinned to one core, you can read the CPU's timestamp counter (TSC), which is the highest resolution tick you can get. In recent Intel CPUs it ticks very solidly at the nominal CPU frequency independent of the actual frequency (which varies wildly). If you Google for "rdtsc", you'll find several implementations for a rdtsc() function that you can just call. You could then try something like:

uint64_t tic, elapsed[10000];

for(i=0; i<10000; i++) {
    tic = rdtsc()
    my_func()
    elapsed[i] = tic - rdtsc()
}

That might get you within shouting distance of a maybe kinda/sorta semi-valid values for individual function calls, from which you can then produce whatever statistics you want (mean/mode/median/variance/std.dev.). The validity of this is seriously open to question, but it's the best that can be done with anything like your method. I'd be much more inclined to run the whole application under perf record and then use perf report to see where the cycles are being expended and focus on that.

How to calculate std dev, quartile, ... with benchmarking a code?

4 Answers4