0

I am using Google Benchmark to benchmark a library. I have the benchmark set up as follows:

for (auto _ : state) {
    run_function(first, last, v);
}

What I would like is for v to be randomly generated every iteration so that I could get a range of benchmark values and obtain the statistics from them. I can do this via:

std::random_device rand_dev;
std::mt19937 generator(rand_dev());
std::uniform_int_distribution<int>  distr(min, max);
for (auto _ : state) {
    v = distr(generator)
    run_function(first, last, v);
}

Some of the functions I am testing are on the order of 10-100ns, so adding in the generator has a significant effect on the results. Is there any way to tell Google Bench to skip a line/block of code?

Throckmorton
  • 564
  • 4
  • 17

2 Answers2

4

You can use the PauseTiming and ResumeTiming methods on the State object to pause and resume the timing. However, it may introduce some overhead to the timing loop. If the function under benchmark is very fast you may notice it.

dma
  • 1,758
  • 10
  • 25
-1

No, it times multiple iterations of the whole loop, instead of inserting timer start/stop around individual statements or even whole iterations.

Timing is expensive, too, e.g. even rdtsc on x86 takes ~20 clock cycles, and getting useful results from it would require serializing out-of-order execution (e.g. with lfence), destroying performance for many kinds of loops where each iteration does independent work. So it's really not very viable or realistic to time short functions on their own. Google Benchmark leaves it up to you to construct a loop that benchmarks on throughput (independent iterations) or latency (feed the output of one call into an input to next iter.)

You have to figure out how to create a whole loop you want to benchmark if you want meaningful results for very small amounts of work. You can get results with PauseTiming and ResumeTiming, but they will massively distort things if the timed portion of each iteration isn't at least a few hundred asm instructions, preferably much larger than the CPU's out-of-order exec window.

In this case, use a very cheap PRNG like xorshift+.

See also Understanding the impact of lfence on a loop with two long dependency chains, for increasing lengths re: the impact of serializing execution. Also Idiomatic way of performance evaluation? re: benchmark pitfalls.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847