16

Disclaimer: Words cannot describe how much I detest AT&T style syntax

I have a problem that I hope is caused by register clobbering. If not, I have a much bigger problem.

The first version I used was

static unsigned long long rdtscp(void)
{
    unsigned int hi, lo;
    __asm__ __volatile__("rdtscp" : "=a"(lo), "=d"(hi));
    return (unsigned long long)lo | ((unsigned long long)hi << 32);
}

I notice there is no 'clobbering' stuff in this version. Whether or not this is a problem I don't know... I suppose it depends if the compiler inlines the function or not. Using this version causes me problems that aren't always reproducible.

The next version I found is

static unsigned long long rdtscp(void)
{
    unsigned long long tsc;
    __asm__ __volatile__(
        "rdtscp;"
        "shl $32, %%rdx;"
        "or %%rdx, %%rax"
        : "=a"(tsc)
        :
        : "%rcx", "%rdx");

    return tsc;
}

This is reassuringly unreadable and official looking, but like I said my issue isn't always reproducible so I'm merely trying to rule out one possible cause of my problem.

The reason I believe the first version is a problem is that it is overwriting a register that previously held a function parameter.

What's correct... version 1, or version 2, or both?

James
  • 9,064
  • 3
  • 31
  • 49
  • 8
    Share your hate of the Syntax. When I get to this point, I look for compiler intrinsic functions or just place the functions in a .s file and assemble them myself... – Michael Dorgan Feb 09 '13 at 01:04
  • @MichaelDorgan VC++ offers a lovely intrinsic, gcc unfortunately doesn't. – James Feb 09 '13 at 01:07
  • 8
    +1 just for 'detest AT&T style syntax' – Martin James Feb 09 '13 at 09:05
  • I think you can very safely refer to [this document](http://www.intel.com/content/dam/www/public/us/en/documents/white-papers/ia-32-ia-64-benchmark-code-execution-paper.pdf), not only to validate but also getting to know a little more in-depth analysis about the accuracy of `RDTSC` and `RDTSCP`. I hope this helps. – user1082170 Sep 03 '15 at 11:39
  • You don't need inline asm with modern compilers (like gcc4.5 or newer). See [Get CPU cycle count?](https://stackoverflow.com/a/51907627) for a fully portable rdtscp and rdtsc using the intrinsic on gcc/clang/MSVC/ICC – Peter Cordes Aug 18 '18 at 11:34

2 Answers2

23

Here's C++ code that will return the TSC and store the auxiliary 32-bits (Processor ID) into the reference parameter

static inline uint64_t rdtscp( uint32_t & aux )
{
    uint64_t rax,rdx;
    asm volatile ( "rdtscp\n" : "=a" (rax), "=d" (rdx), "=c" (aux) : : );
    return (rdx << 32) + rax;
}

It is better to do the shift and add to merge both 32-bit halves in C++ statement rather than inline, this allows the compiler to schedule those instructions as it sees fit.

Update, about aux: The RDTSCP instruction returns the TSC (in two registers), and the Processor ID (aux) in a 3rd register (unlike the RDTSC instruction which only returns the TSC). The Processor ID is an MSR (Machine Specific Register) which therefore must be initialized by privileged system software, its purpose is to identify which "core" is executing the instruction. The value is therefore O/S dependent.

See http://felixcloutier.com/x86/rdtscp

amdn
  • 11,314
  • 33
  • 45
  • Does this not also clobber ECX as well? If not, I'll just delete my answer and call your good. – Michael Dorgan Feb 09 '13 at 01:21
  • 4
    Yes, the "=c" specification tells the compiler that ECX will hold the output, which implies that it is clobbered – amdn Feb 09 '13 at 01:22
  • Thank you. The first version didn't mark `ecx` as being clobbered. This register initally held a parameter value, which was used in a conditional which, if failed, called `std::terminate()`. When this was clobbered the condition obviously was checking the wrong thing! – James Feb 09 '13 at 16:25
  • 2
    Sometimes useful to have references as to where else this type of code is used ... hence, for example, have a look at http://lxr.free-electrons.com/source/arch/x86/include/asm/msr.h#L205 and http://lxr.free-electrons.com/source/arch/x86/include/asm/msr.h#L42 (the opcode for `rdtscp` is the byte sequence given there). – FrankH. Feb 14 '13 at 14:12
  • If you don't need the value set to `%ecx` (which can be used identify CPU cores), you can simply use the clobbers list: `__asm__ __volatile__("rdtscp" : "=a"(lo), "=d"(hi) : : "%ecx" );` – nodakai Oct 30 '14 at 09:04
  • 3
    Concise explanation on why we need the function's `aux` parameter? – haelix Nov 01 '18 at 20:15
  • @amdn What exactly is the purpose of `aux` if the function returns the TSC? – intrigued_66 May 07 '23 at 18:25
  • @intrigued_66 the RDTSCP instruction returns the TSC (in two registers), and the Processor ID (aux) in a 3rd register (unlike the RDTSC instruction which only returns the TSC). The Processor ID is an MSR (Machine Specific Register) which therefore must be initialized by privileged system software, its purpose is to identify which "core" is executing the instruction. The value is therefore O/S dependent. See https://www.felixcloutier.com/x86/rdtscp. – amdn May 10 '23 at 05:56
1

According to this, this operation clobbers EDX and ECX. You need to mark those registers as clobbered which is what the second one does. BTW, is this the link where you got the above code or did you find it elsewhere? It also shows a few other variaitions for timings as well which is pretty neat.

Community
  • 1
  • 1
Michael Dorgan
  • 12,453
  • 3
  • 31
  • 61
  • 1
    Wrong instruction. It's `rdtscp`, not `rdtsc`, and any output is known to be clobbered so it doesn't need to be listed. The problem is that `rdtscp` *also* destroys ecx, which version 2 marks as clobbered but version 1 does not. – ughoavgfhw Feb 09 '13 at 01:13
  • Leaving this here for the other SO link which may be useful - though the above answer is better. – Michael Dorgan Feb 09 '13 at 01:25