What is the best way to measure the cpu time of some functions when doing experiments in C++?

Question

I have some code in C++ and would like to measure the running time (cpu time) of various functions.

I know this has been asked many times, however as in all questions (one can be found here, another here ) you get all sorts of answers. Some use clock, some use gettimeofday, some use weird functions, others external libraries.

Which method offers the best precision and reliability? I would like to be able to get at most down to nanoseconds?

I work under Ubuntu 14.04.

Thank you in advance.

Precision and reliability. I would like to be able to get at most down to nanoseconds. — jsguy, Oct 09 '15 at 13:31
For profiling you need reliable high precision counter which rules out gettimeofday() or rdtsc. On Windows I'm using QueryPerformanceCounter() but not sure what you should use on other platforms. — JarkkoL, Oct 09 '15 at 13:34
I meant for you to edit the question so that it mentions the criteria - as it stands, it looks opinion-based. — Angew is no longer proud of SO, Oct 09 '15 at 13:35
_"Which method is the best for me and why?"_ How on earth should we know?! — Lightness Races in Orbit, Oct 09 '15 at 14:09
[There is a good recent talk about profiling](https://www.youtube.com/watch?v=nXaxk27zwlk) that shows common pitfalls and how to deal with them. The results have nano second resolution, although the interpretation is tricky. — nwp, Oct 12 '15 at 11:06

score 3 · Accepted Answer · answered Oct 09 '15 at 14:10

TLDR: You can get a pretty good idea about hotspots with millisecond resolution, but nanosecond resolution doesn't work for various reasons.

You can probably find or write some function that gives you the best resolution your computer can provide, however, this still doesn't give you any meaningful results:

auto start = getBestPrecisionTime();
foo();
auto end = getBestPrecisionTime();
std::cout << "foo took " << to_nanoseconds(end - start) << "ns";

The first issue is that foo() gets interrupted by another program and you are not actually measuring foo() but foo() + some_random_service. A way to get around that is to make 1000 measurements, hope that at least one of them didn't get interrupted and take the minimum of those measurements. Depending on how long foo() actually takes your chances are anywhere from always to never.

Similarly foo() probably accesses memory which is somewhere in level 1/2/3/4 cache, RAM or on the harddrive, so again you are measuring the wrong thing. You would need to get real world data of how likely it is that memory that foo() needs is in which memory and has which access times.

Another major issue is optimization. It doesn't make much sense to measure the performance of a debug version, so you will want to measure with maximum optimization enabled. With a high optimization level the compiler will reorder and inline code. The getBestPrecisionTime function has two options: Allow the compiler to move code past it or not. If it allows reordering the compiler will do this:

foo();
auto start = getBestPrecisionTime();
auto end = getBestPrecisionTime();
std::cout << "foo took " << to_nanoseconds(end - start) << "ns";

and then optimize it further to

std::cout << "foo took 0ns";

Obviously this produces wrong results and all timing functions I have come across add barriers to disallow this.

But the alternative is not much better. Without the measurement the compiler may optimize this

foo();
bar();

into

code_that_does_foo_bar;

which is more efficient due to better utilization of registers/SIMD instructions/caching/.... But once you measure the performance you disabled this optimization and you measure the wrong version. With a lot of work you may be able to extract which assembler instructions inside code_that_does_foo_bar originated from foo(), but since you can't even tell exactly how long an assembler instruction takes and that time also depends on surrounding assembler instructions you have no chance to get an accurate number for optimized code.

The best you can do is just use std::chrono::high_resolution_clock because it just doesn't get much more precise.

+1 for point out it's not the time measurement that's the hard part, it's designing a useful microbenchmark to actually measure what you're trying to measure, rather than some other bottleneck that hides the thing you're trying to measure. [Agner Fog](http://agner.org/optimize/) has some excellent guides to optimizing, and a library for time measurement. Beyond just time, performance counters can help you figure out what the bottleneck is, and suggest what you might change. Perf counters can give you time in cycles rather than seconds, so variable frequency isn't a problem (except for mem) — Peter Cordes, Oct 09 '15 at 17:52

score 1 · Answer 2 · answered Oct 09 '15 at 13:55

1

You can try benchmark lib from google: https://github.com/google/benchmark

answered Oct 09 '15 at 13:55

anycmon

23
1
6

score 0 · Answer 3 · answered Oct 09 '15 at 13:34

Your question is way too broad to give one answer to rule them all. Depending on your requirements, if you want a cross-platform solution, then std::chrono::high_resolution_clock might fit the bill. If you don't have access to a C++11 compiler or better that supports that, then the various good 'ol C library time functions might suffice. If cross-platform is not an issue and you're only interested in, say Windows, then depending on your resolution needs, QueryPerfomanceCounter or GetTickCount can be used.

If you have specific needs, please mention that in the question.

What is the best way to measure the cpu time of some functions when doing experiments in C++?

3 Answers3