QueryPerformance counter in multicore systems with variable clock speeds

Question

QueryPerformanceFrequency() and QueryPerformanceCounter() functions are said to be the best according to the MSDN article Game Timing and Multicore Processors. But in case of no support for it, I can use timeGetTime() or just GetTickCount().

Is QueryPerformanceFrequency() the same as the CPU clock or is it using its own clock or something that has its own frequency which does not change over time?
What if the frequency changes over time randomly (especially in laptops)
How do I use the SetThreadAffinityMask function? (Some code I have seen uses the function to change it to "1" and then use the counter and change the mask again to old value. Why is that? Is it correct?)
Is it correct to use the QueryPerformanceFrequency() function only once and calculating delta time values by dividing with the frequency in case/question 1? Or is it fixed by case 2?

please specify what kind of software you're developing: e.g. desktop app, 3D game... — Andriy Tylychko, Jan 16 '13 at 16:53
@AndyT: just in case :P. I have several applications in mind. Games and a windows service for my other project which is a child monitor. In the child monitor system I am counting time just in case the child can change the system time :P. — Deamonpog, Jan 16 '13 at 17:32
QPF delivers a constant, no matter what. **But:** Its source is hardware and therefore the `constant QPF` is only an estimate. See [this](http://stackoverflow.com/q/12971110/1504523) SO question to get a closer look. — Arno, Jan 16 '13 at 17:56

user3535668 · Answer 1 · 2014-11-26T21:56:15.837

QPCs underlying implementation varies widely. In some cases it is, but usually it isn't.
That will effect RDTSC, but not QPC.
That is to prevent the thread from moving from one CPU core to another. It may help avoid high resolution timing methods reporting negative time passing (it happens...). Generally not recommended though.
The frequency of QPC is constant. At least on a given system, at least until reboot.

But you're not necessarily asking the right questions...

The four commonly used timing functions on windows are: GetTickCount, timeGetTime, QueryPerformanceCounter (QPC), and RDTSC

My recommendations among those:

Game logic timing should be done with timeGetTime. It is simple, reliable, and has sufficient resolution for that purpose. (edit: default resolution varies - you can call timeBeginPeriod to force it to 1 millisecond resolution though)

GetTickCount should not be used. It's resolution is too poor for either game logic or performance monitoring (64 Hertz - a nasty frequency as it creates a beat frequency with the typical monitor refresh rate). It is the fastest timing function call IIRC, but I can't find a scenario in which that makes up for its poor resolution. (edit: rumor has it that timeBeginPeriod can improve the resolution of GetTickCount - that rumor is FALSE)

RDTSC & QPC are both too unreliable / quirky for simple game logic timing, but better suited for performance measurements. RDTSC has issues that make it painful to use if you want units independent of CPU frequency changes, and you usually need asm to use it. QPC usually just works... but when it goes wrong it can go very wrong, and it goes wrong in a very wide variety of ways (sometimes it's really slow, sometimes it has frequent small negative deltas, sometimes it has infrequent large negative deltas (not wrap-arounds), sometimes it's just completely psychotic, etc). RDTSC is pretty much always faster, and usually significantly better resolution. Overall I prefer RDTSC for in-house use just because it is faster and thus produces fewer distortions in the times its measuring. On customers machines, it's a much closer call - QPC is easier to justify due to Microsoft pushing it, and it works without complications more often, but the wide variety of ways it can screw up on customer machines that you'll never see in-house is a major drawback in my view.

With all due respect, would you mind telling me what is your reference or experience ? — Deamonpog, Apr 17 '14 at 03:22
Mostly from writing games and performance monitoring code that uses these functions, of course. Admittedly much of my knowledge of obscure QPC bugs comes from talking to a game developer (Shizmoo) that briefly tried to use QPC for timing game logic and discovered the hard way why that's a bad idea, circa 2002 IIRC - I've only personally seen three or four ways QPC can fail, as I haven't tested it on that many machines. — user3535668, Nov 26 '14 at 21:52
Oki, thanks :), BTW, what is circa 2002? (is it this apple ipad ,http://www.dvice.com/archives/2012/07/7_photos_of_an.php ?) can you please point out these bugs/how to reproduce the errors / any info on what happens? — Deamonpog, Dec 04 '14 at 06:37
"circa 2002" as in around the year 2002. If you're looking for a game title, I have no idea, it was probably an ActiveX thing though since I think most of their work was. — user3535668, Dec 07 '14 at 10:52

score 0 · Answer 2 · edited Jul 22 '13 at 18:00

0

QPF/QPC are the best if you need a high-precision timer (the returned value is in nanoseconds, but that doesn't mean precision is 1 nanosec). Otherwise, just use GetTickCount() (in milliseconds). Both versions should properly handle variable CPU frequency (for example, on laptops with power saving options).

I have no idea how an affinity mask can help to retrieve the system time.

The proper way to get a high-precision time is to call both QPF and QPC and calculate time as:

double seconds = QPC / QPF;

EDIT:

GetTickCount() has poor precision, something like 5 milliseconds, but it is still suitable for most applications. For measuring really small time periods there's the single option: QPC/QPF.

edited Jul 22 '13 at 18:00

Peter Mortensen

30,738
21
105
131

answered Jan 16 '13 at 16:55

Andriy Tylychko

15,967
6
64
112

"most reliable...: QPC / QPF" How so? If QPF can change at all, what if it changed right after the call to QPC? I'm not trying to pick on anyone, I just want to understand your rationale. Do we have any evidence that QPF can change and if so, when? – 500 - Internal Server Error Jan 16 '13 at 17:21
Its a multithreaded multicore system. So i have to make sure its running on a single processor as the MSDN says. Its them who say that we should use that function in order to do so. – Deamonpog Jan 16 '13 at 17:37
@500-InternalServerError: sorry, explained it bad, corrected my answer a bit. Context: high-precision time typically is required to calculate small time periods, and it's the most reliable way on Windows platform. – Andriy Tylychko Jan 16 '13 at 17:38
I find timeGetTime() as a better solution over GetTickCount(). dont you? its kinda flexible and is faster than GetTickCount(). said it just in case someone else pokes around for answers. :) – Deamonpog Jan 16 '13 at 17:48
@Deamonpog: timeGetTime() is a multimedia timer with the same poor precision as GetTickCount(), but requires linking with winmm.lib. Performance difference (if any) usually doesn't matter for their typical applications. – Andriy Tylychko Jan 16 '13 at 17:57
QPF cannot be considered as being constant over longer periods of time. It also does not return the true number. – Arno Jan 16 '13 at 19:01
@AndyT : ya but you can change it with timeBeginPeriod() timeEndPeriod() functions. – Deamonpog Jan 16 '13 at 19:06
doesn't timeBeginPeriod() change the resolution of GetTickCount() as well? – Arno Jan 16 '13 at 19:10
aah, i think u r rite. :) thanks :) and you know what, i saw someone suggesting that it also changes the resolution of Sleep(). lolz.. i think its the effect that he measured it using something like timeGetTime() :P crazzzzyyy... hehe... thank you both Arno and Andy :) – Deamonpog Jan 16 '13 at 19:20

Olof Forshell · Answer 3 · 2014-01-21T12:19:40.970

I personally prefer the time stamp counter which is a 64-bit counter in the x86 architecture which increments for every internal clock cycle. It is read using the rdtsc instruction and returns the counter value in the edx:eax registers (x86-32) and rdx:rax (x86-64).

There have been issues with the instruction but that was many years ago. Today "green functionality" resulting in load-dependent execution frequency changes make calculating elapsed times more difficult but elapsed clock cycles aren't a problem.

unsigned long long startCycle, endCycle, elapsedCycles, overhead;

// @ start of program

overhead=instruction_rdtsc ();
overhead=instruction_rdtsc ()-overhead;

// preparing to measure

startCycle=instruction_rdtsc ();

// (sequence to measure)

endCycle=instruction_rdtsc ();

elapsedCycles=endCycle-startCycle-overhead;

The overhead of the instruction itself should be determined. I have found that the overhead on an intel processor is smaller than on AMD processors. The overhead should be measured several times - say in a loop - to find the lowest possible value. The longer sequences being measured the less of an issue the overhead becomes. The instruction makes it possible to insert permanent performance metering in an application to be able to measure its actual performance under normal (non-performance testing) execution.

Due to pipelining and out-of-order execution issues very short sequences should not be measured. Some suggest inserting the cpuid instruction before rdtsc but that only means that the actual clock count becomes greater than it actuall is. I see cycle counts of 30 or so as indicative whereas those around 100 or greater are generally reliable. In-between there is a gray zone.

QueryPerformance counter in multicore systems with variable clock speeds

3 Answers3