Calling cpuid before rdtsc to prevent out of order?

Question

I am trying to call cpuid before my rdtsc function to prevent out of order. I initially used this rdtsc function to get 2 timestamps and often I get negative numbers, which is undesirable. This is the rdtsc function, how should I implement cpuid? Or is it called int the main function?

inline uint64_t rdtsc() {
    unsigned long a, d;
    asm volatile ("rdtsc":"=a" (a), "=d" (d));
    return a | ((uint16_t)d << 32);
}

Some information in [this question](https://stackoverflow.com/questions/12065721/why-isnt-rdtsc-a-serializing-instruction). There are many others as well that deal with this issue. — 500 - Internal Server Error, Sep 28 '21 at 06:59
Usually you want `lfence` as a barrier to OoO exec, not a slow CPUID. See also [How to get the CPU cycle count in x86\_64 from C++?](https://stackoverflow.com/q/13772567) for working code to run `rdtsc` which avoids truncating the high 32 bits to 16-bit. — Peter Cordes, Sep 28 '21 at 15:27
See also [What's up with the "half fence" behavior of rdtscp?](https://stackoverflow.com/q/52158572) (And the list of duplicates linked at the top of the page. Some of them show `_mm_lfence()`, or putting `lfence; rdtsc` into one asm template.) — Peter Cordes, Sep 28 '21 at 15:28
Thank you very much. Can we describe lfence/mfence as a more focused way of serializing instructions as opposed to using cpuid? I noticed that it is slower to use cpuid, measurements take longer. — MitandGrit, Sep 29 '21 at 21:45

score 1 · Answer 1 · answered Sep 28 '21 at 11:29

The behavior of (uint16_t) d << 32 is not defined by the C standard.

The left operand of << is (uint16_t) d. After the cast, the integer promotions are performed, so the uint16_t value is converted to an int.¹ This is likely 32 bits in your C implementation. The C standard does not define the behavior of << when the shift amount equals or exceeds with left operand width.

To fix this code, use return a | (uint64_t) d << 32;.

Most compilers warn about this. Pay attention to compiler warning messages. Preferably, elevate them to errors. (With GCC or Clang, use -Werror. With MSVC, use /WX.)

Footnote

¹ This assumes int is wider than 16 bits. If int is only 16 bits, (uint16_t) d << 32 is still undefined because 32 is wider than the left operand type, 16.

Calling cpuid before rdtsc to prevent out of order?

1 Answers1

Footnote