C++ - What's the proper way to perform a speed test?

Question

I searched many questions which asked related information, but the answers didn't quiet match exactly what I wanted for an answer. I'll try to explain the issue as best I can.

Basically when running the code in release mode the compiler seems to remove most code that is redundant or dead code. So it ends up checking nothing. Some fixes were to make the code stored to some variable, but then the compile just removes the looping and stores that last increment it seems.

Now I do wish to have the optimizations made which improve the code used, but I still want everything it was originally doing E.g. If I made it loop the code 100,000 times I expect it to actually perform the code 100,000 times. I'm not sure how to modify the compiler on Visual Studio 2010 so that It does the minimal optimizations when compiling in release mode. I very much would like to accurately time something, but I'm not sure how to accurately time somthing.

At first I thought running in debug without debugging might fix the issue and it very much seemed to since the results matched that of a Java application, but when running in release mode the results where insanely faster which confuses me. I'm not sure if C++ is just that much better in the optimizations or if a large amount of code has been changed.

Is there any way to also dissemble the code possibly and view what the compile compiled the code into? This would be another test I'd like to see, but I don't know much about this stuff and anything in the right direction would be greatly appreciated. Alright well thanks to anyone who can somewhat understand what I'm asking for. I'll be glad to reply to any questions regarding any misunderstanding or uncertainty's on the question at hand.

Is there a profiler in which you'd recommend and how would I go about using one? — Jeremy Trifilo, Feb 04 '13 at 17:32

score 4 · Answer 1 · edited Sep 29 '16 at 16:46

So, to avoid the compiler optimizing away all of your code, you need to make sure you "use" the result of what you do in your code.

The other trick is to put the code under test in a separate file, so the compiler can't inline your "function is outside the file" (unless you enable "whole program optimisation").

I often use function pointers - not so much because it prevents optimization [although it often does], but because it gives me a good basis for doing several tests with the same basic "measure how long it took and print out the results", by having a table, looking a bit like this:

 typedef void (*funcptr)(void);

 #define FUNC(f) { f, #f }

 struct func_entry
 {
      funcptr func;
      const char *name;
 };
 func_entry func_table[] = 
 {
      FUNC(baseline),
      FUNC(better1),
      FUNC(worse1),
 };

 void do_benchmark()
 {
     for(int i = 0; i < sizeof(func_table)/sizeof(func_table[0]); i++)
     {
          timestamp t = now();
          func_table[i].func();
          t = now() - t;

          printf("function %s took %8.5fs\n", func_table[i].name, 
                 timestamp_to_seconds(t));
     }
 }

Obviously, you'll need to replace now() with some suitable time-fetching function, and timestamp with the relevant type for that function, and timestamp_to_seconds with something that works...

score 0 · Answer 2 · answered Feb 04 '13 at 01:22

Not only do you need use the result of the call you are timing in the loop, but you should use the result from every iteration. Here you're trying to make sure the result from every loop is used, but not to impose too much overhead beyond what you are trying to test.

A typical approach would be to accumulate the sum of all the method calls, for something that returns an integral value. This can be extended to methods that don't return ints by calling some other method that returns an int. For example, if your method creates std::strings, call size() on the returned string, which should be very fast. In C++ you can use the address-of operator & as a quick way of turning almost anything into an integer.

In some cases, the compiler might still be able to see through your tricks and hoist the actual method out of the loop, which degenerates to adding a bunch of values or even one big multiplication.

You can avoid this by iterating over some kind of input generated at runtime - the compiler won't be able to constant fold your loop away. Using function pointers can work too, but adds additional overhead and some compilers (and more in the future) can probably still look through them.

The whole time you are doing this, you have to ask yourself what you are really measuring. Very small methods measured in a loop don't necessarily give a good indication of how they'll perform in real life when loops are more complicated. This applies on both sides of the optimization spectrum - e.g., the "hoisting" that you are trying to avoid in your benchmark might actually be occurring in real code, so your benchmark is too pessimistic. Conversely, things like 8K lookup tables which always hit L1 in your benchmark might incur a bunch of misses in real code.

In summary - microbenchmarks a tool to be used carefully - you can definitely measure something, once you understand how to prevent the optimizations you think are unrealistic in practice - but you should always evaluate some real-world use cases as a sanity check (understanding that huge differences in microbenchmarks invariably translate to much smaller improvements in a large program where the tested method is a smaller portion of the runtime to begin with).

score 0 · Answer 3 · answered Feb 04 '13 at 01:39

What I do is have a benchmark() function which sits in a DLL, and pass a function pointer to it. The best way to prevent a compiler from optimizing out a benchmark loop is to make it impossible for it to do so, and putting it in a separate DLL absolutely prevents this. A separate translation unit won't do anymore, with LTCG becoming common.

First: some quick setup. Set the thread's affinity to a single core, and give it high priority. This will prevent a lot of possible variation due to context switching or cache thrashing. And don't forget to use high-precision timers like QueryPerformanceCounter for timings -- things which don't rely on the system time.

Next, call the function pointer in a loop until two seconds have passed. This will warm up the code/data in cache, give you a rough idea of its speed, and allow you to automatically derive a sensible loop count. I choose a loop count which will finish in 1 second.

Next, the actual benchmarking: maintain a counter which is incremented every time a loop doesn't improve the speed of the previous loop. When the counter reaches some set number, stop and assume we've found the best time. When the speed is improved, the counter is reset.

Note this won't tell you what to optimize like a proper profiler will do, but it should give quite accurate results if your only goal is to compare one piece of code to another.

score 0 · Answer 4 · answered Feb 04 '13 at 01:39

0

It depends on what your code is trying to do and how you expect it to behave.

Is your code doing I/O (network, disk, audio, whatever)?
Is it multi-threaded?
It is dealing with memory a lot?
Are you on a time critical loop?
Is it processor intensive?

Generally, you want to run in release mode and use a profiler to see how your code performs in a valid use/test case. Even then, you need to know what profiler to use and again, this depends on what problem you're trying to solve.

Most importantly, don't go looking for a problem when there isn't one. Trust your compiler. Trying to outsmart your compiler these days is quite a futile (and foolish) exercise. Your profiler will only spit out data. It is up to you to know what to look for and interpret it.

answered Feb 04 '13 at 01:39

Carl

43,122
10
80
104

I visited the link and downloaded the one called "CodeAnalyst Profiler". I'll try it out, once I've figure it out and comment back. – Jeremy Trifilo Feb 04 '13 at 17:35
Remember to make sure that your debugging symbols aren't stripped when profiling in release mode. – Carl Feb 04 '13 at 19:40
I actually had issues trying to get it to work. I'm a bit new to the profiling thing. If it helps the reason I'm asking this is I made some post awhile ago. I thought I made some really amazing thing, but after some testing I wasn't so sure it was even that good and quite frankly it might of been even slower. LINK BELOW: http://stackoverflow.com/questions/5777110/fast-implementation-of-trigonometric-functions-for-c/14330468#14330468 – Jeremy Trifilo Feb 09 '13 at 06:21

score -3 · Answer 5 · answered Feb 04 '13 at 00:56

-3

Assuming you are using Visual Studio (Visual C++), set it to Debug profile, right click the project -> properties and go to C/C++ -> Optimization. Make sure it is disabled.

You can then just use stopwatch or the unix program time to count how long your program runs.

Of course there are more sophisticated ways of analyzing performance -- such as using profiler.

answered Feb 04 '13 at 00:56

gerrytan

40,313
9
84
99

5

Using debug mode to test performance is like driving a racing car with the parking-/hand-brake on... – Mats Petersson Feb 04 '13 at 01:08

C++ - What's the proper way to perform a speed test?

5 Answers5