how to estimate a performance unit in a c++ program

Question

What I mean, to measure performance in some algorithm I need to do it in terms of some unit. So, basically if I ignore declarations and definitions of objects and consider only operations , how would you estimate the time each operation consums in run time?.

What have you tried? A simple way would be to measure how long it takes to do 100000 operations and divide that by 100000. Then repeat several times and take the average. To do that you need to know how to measure the time for 100000 operations, do you know how to do that, or is that what you're asking? — Jonathan Wakely, Feb 18 '13 at 18:52
Well basically I don't know how to measure time. Does it make sense to do this? I mean probably it is not right because it might depend on the operations? — Daniela Diaz, Feb 18 '13 at 18:57
search for 'c++ measure time' (and perhaps mention your platform in the query...) — Karoly Horvath, Feb 18 '13 at 18:59
Possible duplicate: http://stackoverflow.com/questions/1949752/how-to-know-calculate-the-execution-time-of-an-algorithm-in-c — Jonas Schäfer, Feb 18 '13 at 18:59
@DanielaDiaz Measuring time is simple (except under Windows, where the `clock()` function is broken). Designing the test so that you measure what you want, and not how well the compiler has optimized, or whether you're getting cache hits, or something else, is much more difficult. — James Kanze, Feb 18 '13 at 19:22

eladidan · Accepted Answer · 2013-02-18T20:35:29.697

In c++11 you have the std::chrono library (see std::chrono). Otherwise, you can use boost::chrono (see boost::chrono). Those 2 are very basic time measurement libraries you can start with.

EDIT: In c++0x you have the clock() function - but as noted in the comment section - it is not a very accurate measurement. Plus, most platforms also supply some sort of API for chrono-related operations.

How to measure performance depends on the context. First, you'll want to compare performance on a large enough and diverse enough sample data. Large enough means that it is at least measurable in seconds. Diverse enough means that it covers repetition of all run scenarios either uniformly or perhaps separately (you should at lease know what cut the sample data represents).

For simple enough cases, the first thing you'll want to do is partition your algorithm into as many as needed segments, let's call them A_1,...,A_N (or perhaps A_1...A_N are different solutions to the same problem, the concept still holds). What you'll want to do is measure the amount of time spent on each partition over the same sample data. In pseudo-code:

times <- (0,...,0) //vector of size N
for each input in sample data:
    start = now()
    //Run A_1 on input
    end = now()
    times[1] <- times[1] + (end-start)
    ...
    ...
    ...
    start = now()
    //Run A_N on input
    end = now()
    times[N] <- times[N] + (end-start)

At the end of this run you are left with a times vector which shows you how much time you've spent in each element. That is the most basic approach. But the key element to profiling is: there are many ways to dissect a frog. You could also look at how much time you are spending on a specific operation which could be called from many partitions of the algorithm. You could choose to look at the number of load/store operations, cache performance etc. The variations are numerous and great reward can often come from unexpected sources (at least to the unwary observer).

For more advanced profiling, you can use a 3rd party profiling framework (some IDEs come with built-in profiling capabilities).

In C++03, you already had `clock()`. Measuring the time isn't the problem. Designing the test so that it measures what you want is. — James Kanze, Feb 18 '13 at 19:22
`clock()` measures CPU time rather than real time. To measure real time C++03 only has `time()`, which typically does not have sub-second resolution. — bames53, Feb 18 '13 at 19:38
@bames53: You could alway divide return result from clock with CLOCKS_PER_SEC macro to get time in seconds — eladidan, Feb 18 '13 at 19:49
@eladidan why people say that clock() has a very low precision approximation? [http://stackoverflow.com/questions/6473438/measure-time-elapsed-in-c](http://stackoverflow.com/questions/6473438/measure-time-elapsed-in-c) — Daniela Diaz, Feb 18 '13 at 20:08
@eladidan That's still CPU time and not real time. That is, a program could run for 1 second and `clock()` may report that the program used 10 seconds of CPU time, or 0.5 seconds, or whatever. CPU time is not wall clock time. — bames53, Feb 18 '13 at 20:18
Okay, good points... I only added it as an afterthought. I'll add the reservations to the answer — eladidan, Feb 18 '13 at 20:33

score 0 · Answer 2 · answered Feb 18 '13 at 19:19

On a modern machine, it depends a lot on context. Most instructions will execute in a single clock; on some machines, you can even get several instructions per clock. On the other hand, memory access times can vary enormously, depending on whether the location being accessed is in cache, in main memory, or swapped out—if the OS has to go to disk in order to map the memory, you're talking about milliseconds for the access, where as if the data is in the top level cache, it could be almost immediate. (In other words, we're talking about 100s of picoseconds versus maybe 10 milliseconds.) This is why you'll hear so much about locality.

score 0 · Answer 3 · answered Feb 18 '13 at 19:34

There are many ways to measure performance, and the "right" way to do it depends a lot on what you are actually doing. Quite often, the simple way to determine performance is simply to let the application do what it normally does, and just measure the time - if it takes long enough, just using a stop-watch (e.g. an app for the mobile phone, function on actual wrist-watch or similar) will work just great. For shorter periods of time, you may need other methods - going all the way down to using "CPU clock cycles" with the "TSC" - timestamp counter in the CPU itself.

If it's something that only takes a short amount of time, but your application will do a lot of it, then running the same code many times.

For other applications, because the time spent isn't really spent using CPU clock cycles, it gets harder - for example, if part of your application is reading some large file, then the CPU time used may only be 2-3%, the rest of the time is spent waiting for the hard-disk to fetch the data from the actual disks in there.

If the application is using networking, again, the CPU probably only uses a few percent of the time that it takes to send a packet of data over a 1GB network link.

The type of data, and the way the data is organized, sorted, and used will also matter. If you insert something to be sorted from a source that is already sorted, it may perform completely differently from a list of items that are unsorted - and surprisingly, it can be better OR worse, depending on how the data is being stored/sorted.

It can be quite hard to "look at code" and determine its performance. There are so many other factors than "what the code looks like" that affects it.

If you have an existing application that works, measuring it's performance, and using profiling software to see how much time is spent where is a good plan.

score 0 · Answer 4 · answered Feb 18 '13 at 19:41

All of these answers are measuring dependant on the hardware you are running your algorithm on. If you want a generic measurement of how complex your algorithm is, you should perform big O and big Omega analysis.

Let me know if you have more questions about this, but according to your question title, you should be looking for the number of instructions your algorithm takes as a function of its data input, rather than measuring performance based on a specific hardware.

how to estimate a performance unit in a c++ program

4 Answers4