0

I am trying to call cpuid before my rdtsc function to prevent out of order. I initially used this rdtsc function to get 2 timestamps and often I get negative numbers, which is undesirable. This is the rdtsc function, how should I implement cpuid? Or is it called int the main function?

inline uint64_t rdtsc() {
    unsigned long a, d;
    asm volatile ("rdtsc":"=a" (a), "=d" (d));
    return a | ((uint16_t)d << 32);
}
phuclv
  • 37,963
  • 15
  • 156
  • 475
  • 1
    Some information in [this question](https://stackoverflow.com/questions/12065721/why-isnt-rdtsc-a-serializing-instruction). There are many others as well that deal with this issue. – 500 - Internal Server Error Sep 28 '21 at 06:59
  • Usually you want `lfence` as a barrier to OoO exec, not a slow CPUID. See also [How to get the CPU cycle count in x86\_64 from C++?](https://stackoverflow.com/q/13772567) for working code to run `rdtsc` which avoids truncating the high 32 bits to 16-bit. – Peter Cordes Sep 28 '21 at 15:27
  • See also [What's up with the "half fence" behavior of rdtscp?](https://stackoverflow.com/q/52158572) (And the list of duplicates linked at the top of the page. Some of them show `_mm_lfence()`, or putting `lfence; rdtsc` into one asm template.) – Peter Cordes Sep 28 '21 at 15:28
  • Thank you very much. Can we describe lfence/mfence as a more focused way of serializing instructions as opposed to using cpuid? I noticed that it is slower to use cpuid, measurements take longer. – MitandGrit Sep 29 '21 at 21:45

1 Answers1

1

The behavior of (uint16_t) d << 32 is not defined by the C standard.

The left operand of << is (uint16_t) d. After the cast, the integer promotions are performed, so the uint16_t value is converted to an int.1 This is likely 32 bits in your C implementation. The C standard does not define the behavior of << when the shift amount equals or exceeds with left operand width.

To fix this code, use return a | (uint64_t) d << 32;.

Most compilers warn about this. Pay attention to compiler warning messages. Preferably, elevate them to errors. (With GCC or Clang, use -Werror. With MSVC, use /WX.)

Footnote

1 This assumes int is wider than 16 bits. If int is only 16 bits, (uint16_t) d << 32 is still undefined because 32 is wider than the left operand type, 16.

Eric Postpischil
  • 195,579
  • 13
  • 168
  • 312