Questions tagged [rdtsc]

RDTSC is the x86 read time stamp counter instruction.

RDTSC is the x86 read time stamp counter instruction often used for high resolution timing.

See How to Benchmark Code Execution Times on Intel® IA-32 and IA-64 Instruction Set Architectures.

Get CPU cycle count? has info on various caveats of using it: on modern x86, it measures reference cycles, not actual core clock cycles. (And also shows how to access it from C++.)

The earliest CPUs to support RDTSC had fixed clock frequency, and some OSes found it was more useful as a low-overhead time source time-of-day functions, so CPU vendors eventually changed it to be how it is now: a fixed-frequency nonstop counter.

It can be out-of-sync across different cores. (Some CPUs avoid that for cores in the same physical package.)

137 questions
68
votes
2 answers

Difference between rdtscp, rdtsc : memory and cpuid / rdtsc?

Assume we're trying to use the tsc for performance monitoring and we we want to prevent instruction reordering. These are our options: 1: rdtscp is a serializing call. It prevents reordering around the call to rdtscp. __asm__ __volatile__("rdtscp;…
Steve Lorimer
  • 27,059
  • 17
  • 118
  • 213
60
votes
5 answers

How to get the CPU cycle count in x86_64 from C++?

I saw this post on SO which contains C code to get the latest CPU Cycle count: CPU Cycle count based profiling in C/C++ Linux x86_64 Is there a way I can use this code in C++ (windows and linux solutions welcome)? Although written in C (and C being…
user997112
  • 29,025
  • 43
  • 182
  • 361
38
votes
6 answers

rdtsc accuracy across CPU cores

I am sending network packets from one thread and receiving replies on a 2nd thread that runs on a different CPU core. My process measures the time between send & receive of each packet (similar to ping). I am using rdtsc for getting…
avner
  • 775
  • 1
  • 8
  • 11
35
votes
1 answer

Lost Cycles on Intel? An inconsistency between rdtsc and CPU_CLK_UNHALTED.REF_TSC

On recent CPUs (at least the last decade or so) Intel has offered three fixed-function hardware performance counters, in addition to various configurable performance counters. The three fixed counters…
BeeOnRope
  • 60,350
  • 16
  • 207
  • 386
20
votes
4 answers

How to count clock cycles with RDTSC in GCC x86?

With Visual Studio I can read the clock cycle count from the processor as shown below. How do I do the same thing with GCC? #ifdef _MSC_VER // Compiler: Microsoft Visual Studio #ifdef _M_IX86 // Processor: x86 …
Johan Råde
  • 20,480
  • 21
  • 73
  • 110
20
votes
3 answers

Getting cpu cycles using RDTSC - why does the value of RDTSC always increase?

I want to get the CPU cycles at a specific point. I use this function at that point: static __inline__ unsigned long long rdtsc(void) { unsigned long long int x; __asm__ volatile (".byte 0x0f, 0x31" : "=A" (x)); // broken for 64-bit…
user1106106
  • 277
  • 2
  • 7
  • 13
20
votes
9 answers

Negative clock cycle measurements with back-to-back rdtsc?

I am writing a C code for measuring the number of clock cycles needed to acquire a semaphore. I am using rdtsc, and before doing the measurement on the semaphore, I call rdtsc two consecutive times, to measure the overhead. I repeat this many times,…
Discipulus
  • 245
  • 1
  • 3
  • 13
19
votes
2 answers

Why should I use 'rdtsc' differently on x86 and x86_x64?

I know that rdtsc loads the current value of the processor's time-stamp counter into the two registers: EDX and EAX. In order to get it on x86 I need to do it like that (assuming using Linux): unsigned long lo, hi; asm( "rdtsc" : "=a" (lo),…
mazix
  • 2,540
  • 8
  • 39
  • 56
18
votes
3 answers

"cpuid" before "rdtsc"

Sometimes I encounter code that reads TSC with rdtsc instruction, but calls cpuid right before. Why is calling cpuid necessary? I realize it may have something to do with different cores having TSC values, but what exactly happens when you call…
Alex B
  • 82,554
  • 44
  • 203
  • 280
15
votes
3 answers

Variance in RDTSC overhead

I'm constructing a micro-benchmark to measure performance changes as I experiment with the use of SIMD instruction intrinsics in some primitive image processing operations. However, writing useful micro-benchmarks is difficult, so I'd like to first…
John Bartholomew
  • 6,428
  • 1
  • 30
  • 39
14
votes
0 answers

Determine TSC frequency on Linux

Given an x86 with a constant TSC, which is useful for measuring real time, how can one convert between the "units" of TSC reference cycles and normal human real-time units like nanoseconds using the TSC calibration factor calculated by Linux at…
BeeOnRope
  • 60,350
  • 16
  • 207
  • 386
13
votes
3 answers

On a cpu with constant_tsc and nonstop_tsc, why does my time drift?

I am running this test on a cpu with constant_tsc and nonstop_tsc $ grep -m 1 ^flags /proc/cpuinfo | sed 's/ /\n/g' | egrep "constant_tsc|nonstop_tsc" constant_tsc nonstop_tsc Step 1: Calculate the tick rate of the tsc: I calculate _ticks_per_ns as…
Steve Lorimer
  • 27,059
  • 17
  • 118
  • 213
11
votes
7 answers

CPU Cycle count based profiling in C/C++ Linux x86_64

I am using the following code to profile my operations to optimize on cpu cycles taken in my functions. static __inline__ unsigned long GetCC(void) { unsigned a, d; asm volatile("rdtsc" : "=a" (a), "=d" (d)); return ((unsigned long)a) |…
Humble Debugger
  • 4,439
  • 11
  • 39
  • 56
9
votes
5 answers

rdtsc, too many cycles

#include static inline unsigned long long tick() { unsigned long long d; __asm__ __volatile__ ("rdtsc" : "=A" (d) ); return d; } int main() { long long res; res=tick(); res=tick()-res; …
eXXXXXXXXXXX2
  • 1,540
  • 1
  • 18
  • 32
9
votes
2 answers

Is Intel's timestamp reading asm code example using two more registers than are necessary?

I'm looking into measuring benchmark performance using the time-stamp register (TSR) found in x86 CPUs. It's a useful register, since it measures in a monotonic unit of time which is immune to the clock speed changing. Very cool. Here is an Intel…
Edd Barrett
  • 3,425
  • 2
  • 29
  • 48
1
2 3
9 10