How QueryPerformanceCounter
is implemented?
QPC timer has different implementations in the HAL depending on hardware; it uses TSC, HPET, RTC, APIC, ACPI or 8254 timers, depending on availability.
QPC timer resolution is hardcoded to 100ns. But it doesn't matter because the call to QPC itself takes >100ns. 100ns is just a very, very short amount of time in Windows world.
RDTSC
has resolution better than 1ns, but it varies with frequency
Not really, the TSC frequency is actually pretty stable since Nehalem. See Intel 64 Architecture SDM vol. 3A, "17.16 Invariant TSC":
Processor families increment the time-stamp counter differently:
For Pentium M processors (family [06H], models [09H, 0DH]); for Pentium 4 processors, Intel Xeon processors (family [0FH], models
[00H, 01H, or 02H]); and for P6 family processors: the time-stamp
counter increments with every internal processor clock cycle.
The internal processor clock cycle is determined by the current
core-clock to bus-clock ratio. Intel SpeedStep technology transitions
may also impact the processor clock.
For Intel Xeon processors (family [0FH], models [03H and higher]); for Intel Core Solo and Intel Core Duo
processors (family [06H], model [0EH]); for the Intel Xeon processor
5100 series and Intel Core 2 Duo processors (family [06H], model
[0FH]); for Intel Core 2 and Intel Xeon processors (family [06H],
DisplayModel [17H]); for Intel Atom processors (family [06H],
DisplayModel [1CH]): the time-stamp counter increments at a constant
rate. That rate may be set by the maximum core-clock to bus-clock
ratio of the processor or may be set by the maximum resolved frequency
at which the processor is booted. The maximum resolved frequency may
differ from the processor base frequency, see Section 18.18.2 for more
detail. On certain processors, the TSC frequency may not be the same
as the frequency in the brand string.
The time stamp counter in newer processors may support an enhancement, referred to as invariant TSC.
Processor’s support for invariant TSC is indicated by CPUID.80000007H:EDX[8]
.
The invariant TSC will run at a constant rate in all ACPI P-, C-. and T-states. This is the architectural behavior
moving forward. On processors with invariant TSC support, the OS may use the TSC for wall clock timer services
(instead of ACPI or HPET timers). TSC reads are much more efficient and do not incur the overhead associated with
a ring transition or access to a platform resource.
So for quick measurements you should be able to use __rdtsc
or __rdtscp
. You can calibrate for the TSC frequency at startup time and ensure it doesn't depend on CPU states. The thread could still be preempted though, so it's good to repeat the measurement multiple times or use QueryThreadCycleTime
(though of course it comes with its own overhead). In practice I find RDTSC
not as bad as it is presented in Calculate system time using rdtsc, though the latter is still a good read.