19

I know that rdtsc loads the current value of the processor's time-stamp counter into the two registers: EDX and EAX. In order to get it on x86 I need to do it like that (assuming using Linux):

    unsigned long lo, hi;
    asm( "rdtsc" : "=a" (lo), "=d" (hi));
    return lo;

and for x86_x64:

        unsigned long lo, hi;
        asm( "rdtsc" : "=a" (lo), "=d" (hi) ); 
        return( lo | (hi << 32) );

why is that? Can anybody explain it to me?

Cœur
  • 37,241
  • 25
  • 195
  • 267
mazix
  • 2,540
  • 8
  • 39
  • 56
  • Those definitions are missing `volatile` on the asm; they're not safe for timing if the compiler can see the start and end. I wonder if that's intentional in Linux because they're never using it for microbenchmarking inside the kernel? But IDK where in Linux it could usefully CSE. – Peter Cordes Aug 18 '18 at 14:16
  • TL:DR: **you shouldn't use it differently**. See [Get CPU cycle count?](https://stackoverflow.com/q/13772567) for asm that works on both 32 and 64-bit. (And my answer which shows how to use the `__rdtsc()` intrinsic instead). – Peter Cordes Aug 18 '18 at 14:20

2 Answers2

13

RDTSC always writes its 64-bit result split into hi/lo halves in EDX and EAX, even in 64-bit mode (see the manual), unfortunately not packing the 64-bit TSC into just RAX. That's why extra work is needed after the asm statement.

To make a single 64-bit integer from it, you need to shift hi to the place it belongs as part of an unsigned long. lo is already in the right place, and writing those 32-bit register zeroed the upper bits of both registers, so we can just OR the (shifted) halves together without having to AND the low half.

In x86-64 Linux, unsigned long is a 64-bit type so the kernel actually uses both halves of the RDTSC return value.

The only reason the 32-bit version is simpler is that the kernel is truncating the result to 32-bit by throwing away the high half. If you do want a 64-bit TSC in 32-bit mode, the same C source works there, too (with uint64_t or unsigned long long), although it wouldn't compile to shift and OR instructions. The compiler would just know that it has a 64-bit integer whose halves are in EDX and EAX.

See also How to get the CPU cycle count in x86_64 from C++? - and for real use, don't forget to make these asm volatile. Otherwise the compiler can assume that repeated executions of this produce the same output, e.g. end-start = 0 after optimization.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
mvv1277
  • 246
  • 2
  • 4
  • 1
    So am I absolutely right, that `rdtsc` loads the current value of the processor's time-stamp counter into the two registers: EDX and EAX, not into registers from EAX to EDX? (EAX, EBX, ECX, EDX) – mazix Jul 01 '13 at 11:07
  • rdtsc always return 64 bit value so for 32 bit machine it stores into EDX and EAX and yes you are right. – mvv1277 Jul 01 '13 at 11:19
10

The difference is not in rdtsc, but in what the Linux kernel wants to do with it.

In 32bit, it returns a 32bit value. So the value in eax is good enough.
In 64bit, it returns a 64bit value. So it needs to combine the values from both registers.

ugoren
  • 16,023
  • 3
  • 35
  • 65
  • According to the [gcc doc](https://gcc.gnu.org/onlinedocs/gcc/Machine-Constraints.html), even in 32bit OSes, 64bit value is returned for `__asm__ __volatile__("rdtsc":"=A"(tick))`. Search for `rdtsc`in the referenced link. – wlnirvana Apr 28 '18 at 08:52
  • @wlnirvana, this would be a good way to get a 64bit timestamp on 32bit. But Linux chose to use `long`, so only 32 bits are needed. – ugoren Apr 28 '18 at 11:35
  • @ugoren could you please elaborate on "Linux chose to use `long`"? – wlnirvana Apr 29 '18 at 15:00
  • @wlnirvana, the function returns `long`, which is 32bit on a 32bit system. This is how the developers chose to define it. – ugoren Apr 29 '18 at 18:21
  • You mean the function the OP used which returns `lo`? Yes that is of course 32bit. But if the style in the gcc doc is used, I think 64bit value would be returned? – wlnirvana Apr 30 '18 at 03:11
  • 1
    @wlnirvana: `return( lo | ((uint64_t)hi << 32) )` works in both modes. No reason to mess around with `"=A"`. See Mysticial's answer on [Get CPU cycle count?](https://stackoverflow.com/q/13772567). – Peter Cordes Aug 18 '18 at 14:18