2

On recent x86, RDTSC returns some pseudo-counter that measures time instead of clock cycles.

Given this, how do I measure actual clock cycles for the current thread/program?

Platform-wise, I prefer Windows, but a Linux answer works too.

user541686
  • 205,094
  • 128
  • 528
  • 886

1 Answers1

3

This is not simple. Such a thing is described in the Intel® 64 and IA-32 Architectures Developer's Manual: Vol. 3B:

Here is the behaviour:

  • For Pentium M processors; for Pentium 4 processors, Intel Xeon processors; and for P6 family processors: the time-stamp counter increments with every internal processor clock cycle. The internal processor clock cycle is determined by the current core-clock to bus-clock ratio. Intel® SpeedStep® technology transitions may also impact the processor clock.
  • For Pentium 4 processors, Intel Xeon processors; for Intel Core Solo and Intel Core Duo processors; for the Intel Xeon processor 5100 series and Intel Core 2 Duo processors; for Intel Core 2 and Intel Xeon processors; for Intel Atom processors: the time-stamp counter increments at a constant rate. That rate may be set by the maximum core-clock to bus-clock ratio of the processor or may be set by the maximum resolved frequency at which the processor is booted. The maximum resolved frequency may differ from the processor base frequency. On certain processors, the TSC frequency may not be the same as the frequency in the brand string.

Here is the advise for your use-case:

To determine average processor clock frequency, Intel recommends the use of performance monitoring logic to count processor core clocks over the period of time for which the average is required. See Section 18.17, “Counting Clocks on systems with Intel Hyper-Threading Technology in Processors Based on Intel NetBurst® Microarchitecture,” and Chapter 19, “Performance- Monitoring Events,” for more information.

The bad news is that AFAIK performance counters are often not portable between AMD and Intel processors. Thus, you certainly need to check which performance counters to use in the AMD documentation. There are also complications: you cannot easily measure the number of of cycle taken by any arbitrary code. For example, the processor can be halted or enter in sleep mode for a short period of time (see C-state) or the OS can executing some protected code that cannot be profiled without high privileges (for sake of security). This method is fine as long as you need to measure the number of cycle of a numerically-intensive code taking relatively-long time (at least several dozens of cycles). On top of all of that, the documentation and usage of MSR is pretty complex and it has some restrictions.

Performance counters like CPU_CLK_UNHALTED.THREAD and CPU_CLK_UNHALTED.REF_TSC seems a good start for what you want to measure. Using library to read such performance counter is generally a very good idea (unless you like having a headache for at least few days). PAPI might be enough to do the job for this.


Here is some interesting related posts:

Jérôme Richard
  • 41,678
  • 6
  • 29
  • 59
  • Thanks for the pointers, I'll check those out! +1 – user541686 Nov 21 '22 at 03:19
  • 1
    @user541686: To measure user-space core clock cycles for a program under Linux, `perf stat --all-user ./my_program` counts the `cycles` hardware PMU event, along with `instructions` and a few others by default. `--all-user` gets it to program the PMUs to only count while the CPU is in non-kernel privilege level. It's pretty easy unless you want to get down to tiny measurement intervals where with `rdpmc` in a program to time itself (after making a system call to program the PMU), at which point measurement overhead and out-of-order exec becomes a huge problem. – Peter Cordes Nov 21 '22 at 08:28
  • 1
    @PeterCordes: Thanks. Yeah I'm hoping to avoid an external profiler program since the cycle counts I want to measure occur during specific intervals in the middle of my program. They're nowhere near small for me to worry about OOO and all that, but the overall program runs for way longer and has much more logic and sources of noise that get in the way of measuring what I want. (And yes, obviously in the worst case I'd put some work into splicing out the hot paths and measuring them in isolation, but it's sometimes a massive pain.) – user541686 Nov 21 '22 at 08:42
  • 1
    @user541686: There are a couple tricks for that: spawn a `perf stat -p ` that attaches to your program at a certain point ([example](https://stackoverflow.com/questions/26267588/perf-stat-for-part-of-program)). Or with more modern `perf`, [Enable/disable perf event collection programmatically](https://stackoverflow.com/q/70314376) lets the program being profiles control `perf stat` by writing to a pipe. – Peter Cordes Nov 21 '22 at 08:46