Testing the performance of a C++ app

Question

I'm trying to find a way to test how long it takes a block of C++ code to run. I'm using it to compare the code with different algorithms and under different languages, so ideally I would like a time in seconds / milliseconds. In Java I'm using something like this:

long startTime = System.currentTimeMillis();

function();

long stopTime = System.currentTimeMillis();
long elapsedTime = stopTime - startTime;

Is there a good way to get an accurate time like that in C++ (Or should I use some other means of benchmarking)?

Related question: http://stackoverflow.com/questions/275004/c-timer-function-to-provide-time-in-nano-seconds — Andy White, Apr 28 '09 at 00:55
Timing is platform-dependent. You should list which platform(s) you're using. — David Thornley, Jul 14 '10 at 15:17
It's frustrating that none of the answers here have a statistical component. — uckelman, Sep 28 '11 at 14:06
I'm going to go ahead and answer my own question by saying that the link () in the comment posted by [Andy White](http://stackoverflow.com/users/60096/andy-white) was what I was looking for. — Jeremy, Jul 14 '10 at 15:10

peterchen · Accepted Answer · 2009-04-29T06:22:12.240

Use the best counter available on your platform, fall back to time() for portability. I am using QueryPerformanceCounter, but see the comments in the other reply.

General advise:

The inner loop should run at least about 20 times the resolution of your clock, to make the resolution error < 5%. (so, when using time() your inner loop should run at least 20 seconds)

Repeat these measurements, to see if they are consistent.

I use an additional outer loop, running ten times, and ignoring the fastest and the slowest measurement for calculating average and deviation. Deviation comes handy when comparing two implementations: if you have one algorithm taking 2.0ms +/-.5, and the other 2.2 +/- .5, the difference is not significant to call one of them "faster". (max and min should still be displayed). So IMHO a valid performance measurement should look something like this:

10000 x 2.0 +/- 0.2 ms (min = 1.2, , max=12.6), 10 repetitions

If you know what you are doing, purging the cache and setting thread affinity can make your measurements much more robust.

However, this is not without pifalls. The more "stable" the measurement is, the less realistic it is as well. Any implementation will vary strongly with time, depending on the state of data and instruction cache. I'm lazy here, useing the max= value to judge first run penalty, this might not be sufficient for some scenarios.

score 8 · Answer 2 · answered Apr 28 '09 at 00:53

8

Execute the function a few thousand times to get an accurate measurement.

A single measurement might be dominated by OS events or other random noise.

answered Apr 28 '09 at 00:53

S.Lott

384,516
81
508
779

score 5 · Answer 3 · answered Apr 28 '09 at 01:01

5

In Windows, you can use high performance counters to get more accurate results:

You can use the QueryPerformanceFrequency() function to get the number of high frequency ticks per seconde and the user the QueryPerformanceCounter() before and after the function you want to time.

Of course, this method is not portable...

answered Apr 28 '09 at 01:01

MartinStettner

28,719
15
79
106

Be careful with the HF counters. Multi-processor systems sometimes make their use...interesting. They work with processor-specific counters, so if you end up with your code on a different CPU, the counters can be off (depending on your exact hardware, of course). – Michael Kohne Apr 28 '09 at 01:05
1

Setting thread affinity to a single CPU for the duration of the benchmark will eliminate SMP-related time warps. There also may be clock ramping due to power management, which would matter if the CPU becomes idle because the benchmarked code sleeps or blocks on I/O. For AMD systems, installing the AMD Processor Driver will improve QPC() synchronization considerably. Windows Vista and Windows 7 use the HPET timer (if available) instead of the TSC, so the TSC problems may eventually go away (when/if Windows XP goes away). – bk1e Apr 28 '09 at 02:44

score 5 · Answer 4 · edited May 23 '17 at 12:19

5

Have you considered actually using a profiler? Visual Studio Team System has one built in, but there are others available like VTune and GlowCode.

See also What's the best free C++ profiler for Windows?

edited May 23 '17 at 12:19

Community

1
1

answered Apr 28 '09 at 01:03

rlbond

65,341
56
178
228

Gah! Do they have ones that don't cost so much? – Billy ONeal Apr 28 '09 at 01:47
4

Profilers are good for answering the question "what is the slowest part of my program?" but usually not so good at answering the question "how slow is my program?" accurately. – bk1e Apr 28 '09 at 02:50

score 4 · Answer 5 · answered Apr 29 '09 at 13:37

What's wrong with clock() and CLOCKS_PER_SEC? They are standard C89.

Something like (nicked from MSDN):

   long i = 6000000L;
   clock_t start, finish;
   double  duration;

   // Measure the duration of an event.
   printf( "Time to do %ld empty loops is ", i );
   start = clock();
   while( i-- ) 
      ;
   finish = clock();
   duration = (double)(finish - start) / CLOCKS_PER_SEC;
   printf( "%2.1f seconds\n", duration );

Mathew Kurian · Answer 6 · 2014-01-24T07:38:47.517

OVERVIEW

I have written a simple semantic hack for this.

Easy to use
Code looks neat.

MACRO

#include <time.h>

#ifndef SYSOUT_F
#define SYSOUT_F(f, ...)      _RPT1( 0, f, __VA_ARGS__ ) // For Visual studio
#endif

#ifndef speedtest__             
#define speedtest__(data)   for (long blockTime = NULL; (blockTime == NULL ? (blockTime = clock()) != NULL : false); SYSOUT_F(data "%.9fs", (double) (clock() - blockTime) / CLOCKS_PER_SEC))
#endif

USAGE

speedtest__("Block Speed: ")
{
    // The code goes here
}

OUTPUT

Block Speed: 0.127000000s

score 2 · Answer 7 · answered Apr 28 '09 at 00:54

2

You can use the time() function to get a timer with a resolution of one second. If you need more resolution, you could use gettimeofday(). The resolution of that depends on your operating system and runtime library.

answered Apr 28 '09 at 00:54

Greg Hewgill

951,095
183
1,149
1,285

score 2 · Answer 8 · answered Apr 28 '09 at 09:40

2

I always use boost::timer or boost::progress_timer.

psudo-code:

#include <boost/timer.hpp>

boost::timer t;

func1();
cout << "fun1: " << t.elapsed();

t.restart();
func2();
cout << "func2(): " << t.elapsed();

answered Apr 28 '09 at 09:40

t.g.

1,719
2
14
25

score 2 · Answer 9 · answered Apr 29 '09 at 08:08

If you want to check you performance, you should consider measuring used processor time, not the real time you are trying to measure now. Otherwise you might get quite inaccurate times if some other application running in background decides to do some heavy calculations at the same time. Functions you want would be GetProcessTimes on Windows and getrusage on Linux.

Also you should consider using profilers, as other people suggested.

score 0 · Answer 10 · answered Apr 08 '11 at 12:28

Your platform for deployment can have a serious impact on your clock precision. If you are sampling inside of a Virtual Machine, all bets are off. The system clock in a VM floats in relation to the physical clock and has to be occasionally resynched. It is almost a certainty that this will happen will happen, given the mischievous nature of Mr Murphy in the software universe

score 0 · Answer 11 · answered Apr 28 '09 at 01:07

You can simply use time() within your code to measure within accuracy of seconds. Thorough benchmarks should run many iterations for accuracy so seconds should be a large enough margin. If you are using linux, you can use the time utility as provided by the command line like so:

[john@awesome]$time ./loops.sh

real    0m3.026s
user    0m4.000s
sys     0m0.020s

score 0 · Answer 12 · answered Apr 28 '09 at 03:03

0

On unix systems (Linux, Mac, etc.) you can use time utility like so:

$ time ./my_app

answered Apr 28 '09 at 03:03

fengshaun

2,100
1
16
25

score 0 · Answer 13 · answered Apr 28 '09 at 03:03

If your function is very fast, a good practice is to time the function in a loop and then subtract the loop overhead.

Something like this:

int i;
int limit=1000000;
int t0=getTime();
for(i=0; i < limit; ++i)
   ;
int t1=getTime();
int loopoverhead = t1-t0;
t0=getTime();
for(i=0; i < limit; ++i)
    function();
t1=getTime();
double tfunction = (1.0/limit)*((t1-t0)-loopoverhead);

score 0 · Answer 14 · edited May 23 '17 at 12:15

0

Run it 1000 times as 100 iterations * 10 iterations, where you unroll the inner loop to minimize overhead. Then seconds translate to milliseconds.

As others have pointed out, this is a good way to measure how long it takes.

However, if you also want to make it take less time, that is a different goal, and needs a different technique. My favorite is this.

edited May 23 '17 at 12:15

Community

1
1

answered Apr 29 '09 at 12:52

Mike Dunlavey

40,059
14
91
135

Testing the performance of a C++ app

14 Answers14

Linked