For a benchmarking application I want to write a C++20 function template like this:
template<size_t n>
void noop() { /* ??? */; }
That when instantiated and executed:
- has no side effects.
- takes
n
nanoseconds to execute - won't be optimized out.
This is of course impossible to do exactly as there might be no sequence of instructions that deterministically takes exactly n nanoseconds to complete on some processors, but what's the best approximation we could do?
For large n (say > 100000) we can just use std::this_thread::sleep_for
. Although it guarantees at least n, in practice its usually within 100 microseconds or so.
But what about for small n?
And is there a way to achieve (3) from the function declaration/definition of noop
? Is there a a way to declare it to not be optimized out?
I'd be curious to know how far we could get with portable / standard-compliant C++20 - in addition to how far we could get if we restricted ourselves to x86-64
and a particular implementation (msvc
, clang
, gcc
)
Any ideas? For 100 to 100000 what about polling rdtsc
? What about n < 100 ?