How can I benchmark the performance of C++ code?

Question

I am starting to study algorithms and data structures seriously, and interested in learning how to compare the performance of the different ways I can implement A&DTs.

For simple tests, I can get the time before/after something runs, run that thing 10^5 times, and average the running times. I can parametrize input by size, or sample random input, and get a list of running times vs. input size. I can output that as a csv file, and feed it into pandas.

I am not sure there are no caveats. I am also not sure what to do about measuring space complexity.

I am learning to program in C++. Are there humane tools to achieve what I am trying to do?

Possible duplicate of https://stackoverflow.com/questions/31195802/is-it-realistic-to-use-o3-or-ofast-to-compile-your-benchmark-code-or-will-it-r — Galik, Mar 01 '18 at 07:17
**Microbenchmarks are *hard***. If you reuse the same test data for every iteration, branch prediction will "learn" the pattern. e.g. I was trying to see just how badly a small BubbleSort sucked, and found that with the same input data every time, all the branches in a 16-element Bubble Sort predicted near-perfectly on my Skylake (i7-6700k) CPU (and branch misses are most of the perf cost). Besides that, it's hard to write a microbenchmark that even measures what you want and isn't optimized in a way that's unrealistic for the real context you want to use the code in. — Peter Cordes, Mar 01 '18 at 07:48
When making a microbenchmark, it helps to know asm, and know what's slow and fast on CPU hardware (including cache / memory performance effects), so you'll know which factors are probably important and which probably aren't when trying to isolate something into a microbenchmark. See some performance links [in SO's x86 tag wiki](https://stackoverflow.com/tags/x86/info) for this kind of info for x86. And especially **[What Every Programmer Should Know About Memory](https://stackoverflow.com/questions/8126311/what-every-programmer-should-know-about-memory/47714514#47714514)** — Peter Cordes, Mar 01 '18 at 07:51

wdudzik · Answer 1 · 2018-03-01T12:53:00.427

Benchmarking code is not easy. What I found most useful was Google benchmark library. Even if you are not planning to use it, it might be good to read some of examples. It has a lot of possibilities to parametrize test, output results to file and even returning you Big O notation complexity of your algorithm (to name just few of them). If you are any familiar with Google test framework I would recommend you to use it. It also keeps compiler optimization possible to manage so you can be sure that your code wasn't optimized away.

There is also great talk about benchmarking code on CppCon 2015: Chandler Carruth "Tuning C++: Benchmarks, and CPUs, and Compilers! Oh My!". There are many insights in possible mistake that you can make (it also uses google benchmark)

Basile Starynkevitch · Answer 2 · 2018-03-01T07:40:27.620

8

It is operating system and compiler specific (so implementation specific). You could use profiling tools, you could use timing tools, etc.

On Linux, see time(1), time(7), perf(1), gprof(1), pmap(1), mallinfo(3) and proc(5) and about Invoking GCC.

See also this. In practice, be sure that your runs are lasting long enough (e.g. at least one second of time in a process).

Be aware that optimizing compilers can transform drastically your program. See CppCon 2017: Matt Godbolt talk “What Has My Compiler Done for Me Lately? Unbolting the Compiler's Lid”

edited Mar 01 '18 at 07:40

answered Mar 01 '18 at 07:16

Basile Starynkevitch

223,805
18
296
547

5

For more about constructing microbenchmarks that optimize properly without optimizing away, see **[CppCon 2015: Chandler Carruth "Tuning C++: Benchmarks, and CPUs, and Compilers! Oh My!"](https://www.youtube.com/watch?v=nXaxk27zwlk)**. He uses an empty inline asm statement with appropriate constraints (GNU syntax) to require the compiler to have a variable value in a register at some point. Chandler is a clang developer. (edit: this is the same talk linked in the other answer.) – Peter Cordes Mar 01 '18 at 07:41

score 4 · Answer 3 · answered Mar 01 '18 at 07:47

Talking from an architecture point of view, you can also benchmark your C++ code using different architectural tools such as Intel Pin, perf tool. You can use these tools to study the architecture dependency of your code. For example, you can compile your code for different level of optimizations and check the IPC/CPI, cache accesses and load-store accesses. You can even check if your code is suffering a performance hit due to library functions. The tools are powerful and can give you potentially huge insights into your code.

You can also try disassembling your code and study where your code spends most of the time and try and optimize that. You can look at different techniques to ensure that the frequently accessed data remains in the cache and thus ensure a high hit rate.

Say, you realize that your code is heavily dominated by loops, you can run your code for different loop bounds and check for the metrics in 2 cases. For example, set the loop bound for 100,000 and find the desired performance metric 'X' and then set the loop bound for 200,000 and find the performance metric 'Y'. Now,calculate Y-X. This will give you a much better insight into the behavior of the loops because by subtracting the two metrics, you have effectively removed the static effects of the code.

Say, you run your code for 10 times and with different user input size. You can maybe find the runtime per user input size and then sort this new metric in ascending order, remove the first and the last value(to remove the outliers) and then take the average. Finally, find the Coefficient of variance to understand how the run times behave.

On a side note, more often than not, we end up using the term 'average' or 'arithmetic mean' rashly. Look at the metric you plan to average and look at harmonic means, arithmetic means and geometric means in each of the cases. For example,finding the arithmetic mean for rates will give you incorrect answers. Simply finding arithmetic means of two events which do not occur equally in time can give incorrect results. Instead, use weighted arithmetic means.

How can I benchmark the performance of C++ code?

3 Answers3

Linked