Timing and Benchmarking

Question

If I want to compare the speed of two algorithms, what's the best way? I am familiar with the mathematical way, and I know I don't full master it yet =/ I'm interested in knowing how to time two algorithms with good precision on Windows. What are the best APIs to use, is time() from stdio.h reliable or do I need something better?

An example would be good too! Thank you!

Possible duplicate of [How to Calculate Execution Time of a Code Snippet in C++](https://stackoverflow.com/questions/1861294/how-to-calculate-execution-time-of-a-code-snippet-in-c) — norok2, Sep 23 '19 at 17:51

score 2 · Answer 1 · answered Dec 12 '10 at 22:52

What you are looking for is QueryPerformanceCounter and QueryPerformanceFrequency. These are Windows API functions to access high resolution timers. They should yield better precision then time or GetSystemTime.

LARGE_INTEGER time1;
QueryPerformanceCounter(&time1);

// Your code

LARGE_INTEGER time2;
QueryPerformanceCounter(&time2);

LARGE_INTEGER ticksPerSecond;
QueryPerformanceFrequency(&ticksPerSecond);
double seconds = (double)(time2.QuadPart - time1.QuadPart) / ticksPerSecond.QuadPart;

And here I found an implementation of a CStopWatch class that could be used as an example.

Pascal T. · Accepted Answer · 2010-12-13T11:27:36.477

Unfortunately this is not such an easy task.

As previously mentioned QueryPerformanceCounter is a viable choice.
Other possibilities are:
- GetTickCount (not advisable, since its accuracy is worse than 30ms)
- timeGetTime: By default it has a 15ms accuracy. This corresponds to the default time allocated by the task scheduler (15ms on my computer). You can force timeGetTime to be more accurate, by changing a system wide setting that rules the time allocated by the scheduler : a call to timeBeginPeriod can do this. However, this should only be used as a temporary system hack! Please do not use it in your release code!
- Query the processor's time stamp counter : this requires assembler programming and I would not recommend it.

As far as QueryPerformanceCounter is concerned, you can find an easy to use wrapper here : http://www.codeproject.com/KB/datetime/perftimer.aspx

You can use it this way :

CPerfTimer t;
t.Start();     
CallExpensiveTask();
std::cout << "Time (ms) " << t.Elapsedms();

A few advices however :

As mentionned, run your function a lot of times in order to get a reliable measure
Beware of the task scheduler : On an average computer each process is given 15ms before switching to another process. If the task scheduler switches task during the call to your measured function, you might measure a time that is much higher (around 15ms higher)
Note that sleep(1) yields to a 15ms pause (since the scheduler will switch to a different process)
Remember that QueryPerformanceCounter can (rarely) give inaccurate results : On multicore processors, you might sometimes see a negative time lapse (!). In that case, you should redo your measure (see http://www.virtualdub.org/blog/pivot/entry.php?id=106 )
Temporary solutions to counteract the scheduler effect: - Raise your process priority (you can do it through the task manager) - Hack the scheduler : the timeBeginPeriod (http://msdn.microsoft.com/en-us/library/dd757624(v=VS.85).aspx) provided by microsoft has the power to change the time allocated to each task from 15ms to lower values (remember to not include this is your release code, since this is a system wide setting that can reduce the global performance...)

Some more notes about the processor's time stamp counter

I did not test its accuracy myself, but http://en.wikipedia.org/wiki/Time_Stamp_Counter is a good source of information about it, and about its limitations (esp. on multicore processors, and also on processors with variable clock frequencies)
AFAIK, it is advisable to use this kind of timer on a single thread for which you set the processor affinity
An example of implementation (found at http://developer.nvidia.com/object/timer_function_performance.html) could be:

#pragma warning (disable : 4035)    // disable no return value warning

__forceinline  DWORD GetPentiumCounter()
{
__asm
{
    xor   eax,eax             // VC won't realize that eax is modified w/out this
                              //   instruction to modify the val.
                              //   Problem shows up in release mode builds
    _emit 0x0F                // Pentium high-freq counter to edx;eax
    _emit 0x31                // only care about low 32 bits in eax

    xor   edx,edx             // so VC gets that edx is modified
}
}

#pragma warning (pop)

Ok this answer is more interesting. When you talk about Querying the processor's time stamp counter, how accurate is it ? Because I don't mind using Assembly code, i'm comfortable with that. And are you sure about the 15 ms quantum? That's a lot of time, i thought it did context switching earlier than that :) — toto, Dec 13 '10 at 02:14
@toto It is more complicated than just saying 15ms. http://support.microsoft.com/kb/259025 note a thread may get between 3-9 quantums where 3 quantums is somewhere between 10 and 15 ms. Getting around the QueryPerformanceCounter issue is as easy as setting processor affinity for your test app. — stonemetal, Dec 13 '10 at 02:24
@toto: Are you sure you need such accuracy? Just run you algorithm in a loop to make total execution time a minute or so and then compare to the other algorithm run the same number of times. This should give you an idea how big the difference is. And make sure nothing heavy runs in the background, you process/thread will then get 100% (not really, but close enough for practical purposes). — detunized, Dec 13 '10 at 08:02
Alright, thanks. By the way, I liked your answer but I chose this one because It was stuff I already knew while this one is more detailled and there's some new stuff for me. — toto, Dec 13 '10 at 08:44

score 0 · Answer 3 · answered Dec 12 '10 at 22:55

0

QueryPerformanceCounter/QueryPerformanceFrequency. AFAIK, they are fixed now, and actually measure cycles. Just remember to rise the priority of the thread so that it would have less impact from other threads while you measure.

Preempting can cause quite a bit of error if the runtime of the measurable code is low.

So rise priority and rerun tests multiple times.

answered Dec 12 '10 at 22:55

Coder

3,695
7
27
42

Are you sure that QueryPerformanceCounter was fixed? Are you talking about the problem described at http://www.virtualdub.org/blog/pivot/entry.php?id=106 – Pascal T. Dec 13 '10 at 01:47
No, I was thinking about hardware bug that shifted performance counter on multicore machines by a whole lot http://support.microsoft.com/kb/895980. The speedstep will still affect the counter, so watchdog with GetTickCount is necessary to detect core slowdowns/speedups if the application could run in such environments. But for most benchmarking tasks QPC is good enough nowadays. – Coder Dec 13 '10 at 01:55
Speedstep does **not** affect QPC. That is precisely why it is preferred over `RDTSC`. The *only* thing that may potentially break QPC is... buggy hardware/drivers/bios. But Speedstep and other such tricks do not affect it. (and it doesn't measure clock cycles specifically) – jalf Dec 13 '10 at 03:44

Timing and Benchmarking

3 Answers3