How to make sure the call is not optimised away when measuring time?

Question

I wrote a function template to measure time:

#include <ctime>
template <typename FUNCTION,typename INPUT,int N>
double measureTime(FUNCTION f,INPUT inp){
  // double x;
  double duration = 0;
  clock_t begin = clock();
  for (int i=0;i<N;i++){
      // x = f(inp);  
      f(inp);
  }
  clock_t end = clock();
  // std::cout << x << std::endl;
  return double(end-begin) / CLOCKS_PER_SEC;
}

And I use it like this:

#include <iostream>
typedef std::vector<double> DVect;
double passValue(DVect a){
    double sum = 0;
    for (int i=0;i<a.size();i++){sum += sum+a[i];}
    return sum;
}
typedef double (*passValue_type)(DVect);

int main(int argc, char *argv[]) {
    const int N = 1000;
    const int size = 10000;
    std::vector<double> v(size,0);
    std::cout << measureTime<passValue_type,DVect,N>(passValue,v) << std::endl;
}

The aim is to reliably measure the cpu time of different functions, e.g. pass-by-value vs pass-by-reference. Actually it seems to work nicely, however, sometimes the resulting time is too short to be measured and i just get 0 as result. To make sure the function is called, I printed the result of the call (see comments in above code). This I would like to avoid and I would like to keep the template as simple as possible, so my question is:

How can I make sure that the function is really called and not optimised away (because the return value is not used)?

You could try applying some side effect, as updating a static local variable, that is used as return value. Of course that wouldn't work if you actually want to use the returned value in a multi threaded application. — πάντα ῥεῖ, Sep 04 '15 at 19:09
Maybe this will work: http://stackoverflow.com/questions/7083482/how-to-prevent-gcc-from-optimizing-out-a-busy-wait-loop — NathanOliver, Sep 04 '15 at 19:11
As I see it, your `measureTime` function tells you how much time your code is actually taking to execute. If your concern is that pass by value is too slow, then you should benchmark the actual code for which you care, in which you are fairly sure it will not be optimized out, and you won't have this problem. — Brian Bi, Sep 04 '15 at 19:14
@Brian I am not concernded about it, I just wanted to experiment a bit to get a quantitative feeling for the difference, but my test cases takes either annoyingly long for pass by value or to little to be measured for pass by reference. — 463035818_is_not_an_ai, Sep 04 '15 at 19:26

score 3 · Accepted Answer · edited Sep 04 '15 at 19:42

I typically do something like this:

#include <ctime>

template <typename FUNCTION,typename INPUT,int N>
double measureTime(FUNCTION f,INPUT inp){
  double x = 0;
  double duration = 0;
  clock_t begin = clock();
  for (int i=0;i<N;i++){
      x += f(inp);  
  }
  clock_t end = clock();
  std::cout << x << std::endl;
  // or if (x < 0) cout << x; or similar.
  // such that it doesn't ACTUALLY print anything.
  return double(end-begin) / CLOCKS_PER_SEC;
}

The above assumes that f actually does something non-trivial that the compiler can't figure out how to simplify. If f is return 6; then the compiler will convert it to x = 6 * N;, and you get very short runtime indeed.

If you want to be able to use "any" function, you will have to do some more clever stuff:

template <typename FUNCTION,typename INPUT,int N, typename RET>
double measureTime(FUNCTION f,INPUT inp){
  RET x = 0;
  double duration = 0;
  clock_t begin = clock();
  for (int i=0;i<N;i++){
      x += f(inp);  
  }
  clock_t end = clock();
  std::cout << x << std::endl;
  return double(end-begin) / CLOCKS_PER_SEC;
}

template <typename FUNCTION,typename INPUT,int N, void>
double measureTime(FUNCTION f,INPUT inp){
  clock_t begin = clock();
  for (int i=0;i<N;i++){
      f(inp);  
  }
  clock_t end = clock();
  return double(end-begin) / CLOCKS_PER_SEC;
}

[I haven't actually compiled the above code, so it may have minor flaws, but as a concept it should work].

Since any meaningful void function will have to do something that affects the surrounding world (output to a stream, change a global variable or call some system call), it won't be eliminated. Of course, calling an empty function or similar is likely to cause trouble.

Another method, assuming you don't care about not inlining the call is to actually place the function under test in a separate file, and not let the compiler "see" that function from the code that measure the time [and not use -flto to allow it to inline the function at link-time] - that way, the compiler can't KNOW what the function under test is doing, and not eliminate the call.

It should be noted that there is really no way to GUARANTEE that the compiler doesn't eliminate a call, other than either "make it impossible for the compiler to know what the outcome of the function is" (for example use random/externally sourced input), or "don't let the compiler know what the function does".

yes that works, but this is what I would like to avoid. The template should work for any return type, which i thought is easiest, when the return value is simply ignored — 463035818_is_not_an_ai, Sep 04 '15 at 19:09
Note that clever compilers may still be able to constant-fold this ([I've seen clang doing something like that](https://stackoverflow.com/questions/15114140/writing-binary-number-system-in-c-code)). — The Paramagnetic Croissant, Sep 04 '15 at 19:14
Then add the returntype of the function, and specialize the `void` case - since a `void` function must "do something else" [unless it's not actually doing anything at all]. I'll edit an example of that. — Mats Petersson, Sep 04 '15 at 19:14
I am close to accept the answer, but I still have a small doubt: What if the return type has no `operator+`? — 463035818_is_not_an_ai, Sep 04 '15 at 19:31
Again, you'll probably have to specialize that one too - but the problem with that is that you really need to USE the result - if `N` is small enough, you could make a local array that stores the output from the function for example [and then uses all the output] - or sum up the `x.size()` if it's some container. It's almost impossible to cover EVERY eventuallity - what if it's a non-copyable return, etc, etc. I don't think there is ONE way that works for EVERYTHING. — Mats Petersson, Sep 04 '15 at 19:34
now I am convinced ;) I was hoping for a simple trick, but if there is none I will just stay with stupid `std::cout`s. Thanks for your effort. — 463035818_is_not_an_ai, Sep 04 '15 at 19:37

score 0 · Answer 2 · edited May 23 '17 at 11:58

Without inlining: Make sure that the function call and the function definition are in separate compilation units (i.e. cpp-files), then disable link-time optimization in your build.

In this case compiler will not be able to inline your function call due to how compilation units work in C++. Also, the compiler will not be able to remove the call completely. In fact, it will know nothing about your function (except for the signature) at the moment it optimizes the call.

With inlining: The simple way described above will not work if you want to measure time with your function call inlined. In such case you have to make sure that: for each operation inside your function there is some observable behavior that depends on it. You can for example write your results to volatile variables, or calculate some sum/hash of the results and print it to stdout.

Maxim Egorushkin · Answer 3 · 2015-09-14T11:52:14.390

0

Many compilers have extensions to disable inlining of a function. For gcc, it is __attribute__((noinline)), e.g.:

__attribute__((noinline)) void foo() { ... }

Boost provides a portable BOOST_NOINLINE macro.

edited Sep 14 '15 at 11:52

answered Sep 14 '15 at 11:46

Maxim Egorushkin

131,725
17
180
271

You can see a macro defined for GCC/MSVC compilers [here](http://pastebin.com/MsJuWzwW). – stgatilov Sep 14 '15 at 11:49

How to make sure the call is not optimised away when measuring time?

3 Answers3