Benchmarking using and instruction reordering

Question

I've been using, up until now, the traditional way to benchmark concurrent methods, which is to measure the elapsed duration for a number of runs:

template <typename Functor>
double benchmark(Functor const& f, size_t nbRuns)
{
  if (nbRuns == 0) { return 0.0; }

  f(); // Initialize before measuring, I am not interesting in setup cost

  time_t begin = time(0);
  for (size_t i = 0; i != nbRuns; ++i) { f(); }
  time_t end = time(0);

  return difftime(end, begin);
}

which seemed all fine and dandy until I came upon this question: Optimizing away a "while(1);" loop in C++0x.

What strikes me as unusual is that the compiler is allowed to execute the output BEFORE the loop... and I am suddenly wondering:

What prevents the compiler from executing time_t end = time(0); before the loop here ?

because if it did, that would somehow screw my little benchmark code.

And while we are at it, if ever the reordering could occur in this situation:

How can one prevent it ?

I could not think of relevant tags apart from the C++ ones, if anyone think I've missed one, feel free to add it

What is your compiler? Are you sure that the loop is still running after the output? You should insert printf inside the loop and test. Does f() create another thread? What may run after the output is other threads. — Squall, Nov 22 '10 at 16:36
@Squall: it doesn't happen, or at least I do not think it does, but I was wondering it work by mere chance. — Matthieu M., Nov 22 '10 at 19:32

score 6 · Accepted Answer · answered Nov 22 '10 at 16:44

6

This is a tricky question.

What prevents the compiler from executing time_t end = time(0); before the loop here ?

Generally, nothing; in fact, even in C++03. Because of the as-if rule, the compiler may emit any code which has the same observable behaviour. That means, if omitting f() doesn't change any specified input/output, or volatiles access, it may not run f() at all.

What strikes me as unusual is that the compiler is allowed to execute the output BEFORE the loop

That's not really true - the issue with the empty loop is that C++0x doesn't count mere nontermination as observable behavior. It's not that it can reorder empty loop and the output of "Hello", it's rather that the compiler can leave out the empty loop altogether.

answered Nov 22 '10 at 16:44

jpalecek

47,058
7
102
144

effectively, `f` is pure computation and I do not use the result, which is what got me thinking. It does call non-inline functions and I don't think LTO could spot their own (non)purity, which probably prevents the optimization here (I guess...) – Matthieu M. Nov 22 '10 at 19:34
@Matthieu M.: Yes, it's likely it would work. However, the standard doesn't require such behavior (it would be quite hard to specify). The compilers are not that eager to throw out non-inline function calls, but that could easily change in the future. – jpalecek Nov 22 '10 at 21:06

CashCow · Answer 2 · 2013-02-26T12:07:48.060

Normally I would put my timer into a scope using an object so it calculates the "end" in its destructor when it goes out of scope.

Would the compiler be allowed to execute its destructor whilst still in the scope? I don't know.

Of course time_t only measures seconds so I would normally measure a finer grain, usually milliseconds. Sometimes milliseconds is not granular enough (e.g. very small functions that are called lots of times) in which case you would probably use microseconds.

Of course in this case there is an overhead in entering and leaving the scope itself, but it is often a good measure in an "intrusive" profiling which is often very good for optimising in real cases. (You can often switch the feature on and off).

Benchmarking using and instruction reordering

2 Answers2

Linked