I don't understand the definition of DoNotOptimizeAway

Question

I am checking on Celero git repository the meaning of DoNotOptimizeAway. But I still don't get it. Could you please help me understand it in layman's terms please. As much as you can.

The celero::DoNotOptimizeAway template is provided to ensure that the optimizing compiler does not eliminate your function or code. Since this feature is used in all of the sample benchmarks and their baseline, it's time overhead is canceled out in the comparisons.

There's an example in the comments https://github.com/DigitalInBlue/Celero/blob/4dfcb9d2326324af405df0037c73a351dd619e2f/include/celero/Utilities.h#L47 showing a for loop with no side effects being evaluated at compile time and removed entirely from the runtime. Is that helpful, or do you need more? — Rup, Sep 06 '18 at 12:14
basically we really want to test the REAL performance of our algorithm without the interference of the optimization of the Compiler. I GOT IT — gringo, Sep 06 '18 at 12:35
One the difficulties of benchmarking is if you don't do anything with the result of a calculation a smart compiler may optimize the entire calculation away. — drescherjm, Sep 06 '18 at 12:48
If the "REAL performance" of your algorithm is that of loading a constant, why don't you want the benchmark to show that? — Caleth, Sep 06 '18 at 13:08
not exactly, of course you want to benchmark with optimizations. The thing is just that often a benchmark differs from the real case in that it gives the compiler more freedom for optimizations (for example because you actually dont use the result of some complex calculation) and this is what you want to avoid when benchmarking — 463035818_is_not_an_ai, Sep 06 '18 at 13:09
Related Q&As about the same functions: [Preventing compiler optimizations while benchmarking](https://stackoverflow.com/q/40122141) / [Avoid optimizing away variable with inline asm](https://stackoverflow.com/q/44562871) / ["Escape" and "Clobber" equivalent in MSVC](https://stackoverflow.com/q/33975479) — Peter Cordes, Aug 05 '20 at 18:28
happy New Year @PeterCordes. thanks for your comment. Even if it's 2 years late. I'll still be needing it for sure. — gringo, Aug 06 '20 at 08:04

Peter Cordes · Accepted Answer · 2020-08-06T15:06:32.927

You haven't included the definition, just the documentation. I think you're asking for help understanding why it even exists, rather than the definition.

It stops compilers from CSEing and hoisting work out of repeat-loops, so you can repeat the same work enough times to be measurable. e.g. put something short in a loop that runs 1 billion times, and then you can measure the time for the whole loop easily (a second or so). See Can x86's MOV really be "free"? Why can't I reproduce this at all? for an example of doing this by hand in asm. If you want compiler-generated code like that, you need a function / macro like DoNotOptimizeAway.

Compiling the whole program with optimization disabled would be useless: storing/reloading everything between C++ statements gives very different bottlenecks (usually store-forwarding latency). See Adding a redundant assignment speeds up code when compiled without optimization

See also Idiomatic way of performance evaluation? for general microbenchmarking pitfalls

Perhaps looking at the actual definition can also help.

This Q&A (Optimization barrier for microbenchmarks in MSVC: tell the optimizer you clobber memory?) describes how one implementation of a DoNotOptimize macro works (and asks how to port it from GNU C++ to MSVC).

The escape macro is from Chandler Carruth's CppCon2015 talk, "Tuning C++: Benchmarks, and CPUs, and Compilers! Oh My!". That talk also goes into detail about exactly why it's needed when writing microbenchmarks: to stop whole loops from optimizing away when you compile with optimization enabled.

(Having the compiler hoist things out of loops instead of compute them repeatedly is harder to get right if it's a problem. Making a function __attribute__((noinline)) can help if it's big enough that it didn't need to inline. Check the compiler's asm output to see how much setup it hoisted.)

And BTW, a good definition for GNU C / C++ normally has zero extra cost:
asm volatile("" :: "r"(my_var)); compiles to zero asm instructions, but requires the compiler to have the value of my_var in a register of its choice. (And because of asm volatile, has to "run" that many times in the C++ abstract machine).

This will only impact optimization if the compiler could have transformed the calculation it was part of into something else. (e.g. using this on a loop counter would stop the compiler from using just pointer-increments and compare against an end-pointer to do the right number of iterations of for(i=0;i<n;i++) sum+=a[i];

Using a read-modify-write operand like asm volatile("" :"+r"(my_var)); would force the compiler to forget all range-restriction or constant-propagation info it knows about the value, and treat it like an incoming function arg. e.g. that it's 42, or that it's non-negative. This could impact optimization more.

When they say the "overhead is cancelled out in comparisons", they're hopefully not talking about explicitly subtracting anything from a single timing result, and not talking about benchmarking DoNotOptimizeAway on its own.

That wouldn't work. Performance analysis for modern CPUs does not work by adding up the costs of each instruction. Out-of-order pipelined execution means that an extra asm instruction can easily have zero extra cost if the front-end (total instruction throughput) wasn't the bottleneck, and if the execution unit it needs wasn't either.

If their portable definition is something like volatile T sink = input;, the extra asm store would only have a cost if your code bottlenecked on store throughput to cache.

So that claim about cancelling out sounds a bit optimistic. As I explained above, Plus the above context / optimization-dependent factors. It's possible that a DoNotOptimizeAway)

Related Q&As about the same functions:

I don't understand the definition of DoNotOptimizeAway

1 Answers1

Linked