1

according to 《How to Benchmark Code Execution Times on Intel® IA-32 and IA-64 Instruction Set Architectures》, i use code below:

static inline uint64_t bench_start(void)
{
    unsigned cycles_low, cycles_high;
    asm volatile("CPUID\n\t"
        "RDTSCP\n\t"
        "mov %%edx, %0\n\t"
        "mov %%eax, %1\n\t"
        : "=r" (cycles_high), "=r" (cycles_low)
        ::"%rax", "%rbx", "%rcx", "%rdx");

    return (uint64_t) cycles_high << 32 | cycles_low;
}

static inline uint64_t bench_end(void)
{
     unsigned cycles_low, cycles_high;
     asm volatile("RDTSCP\n\t"
         "mov %%edx, %0\n\t"
         "mov %%eax, %1\n\t"
         "CPUID\n\t"
         : "=r" (cycles_high), "=r" (cycles_low)
         ::"%rax", "%rbx", "%rcx", "%rdx");
     return (uint64_t) cycles_high << 32 | cycles_low;
}

but in fact, I also see someone use code below:

static inline uint64_t bench_start(void)
{
   unsigned cycles_low, cycles_high;
   asm_volatile("RDTSCP\n\t"
                : "=d" (cycles_high), "=a" (cycles_low));
   return (uint64_t) cycles_high << 32 | cycles_low;
}

static inline uint64_t bench_start(void)
{
   unsigned cycles_low, cycles_high;
   asm_volatile("RDTSCP\n\t"
                : "=d" (cycles_high), "=a" (cycles_low));
   return (uint64_t) cycles_high << 32 | cycles_low;
}

as you know, RDTSCP is pseudo serializing ,why someone use the second code?two reasons I guess, below:

  • Maybe in most situation, RDTSCP can ensure complete "in-order exectuion"?

  • Maybe just want to avoid using CPUID for efficient?

JunChan
  • 11
  • 2
  • I can't imagine how the second implementation can be justified due to the reason you have mentioned and stated in that Intel whitepaper. Also, the `rdtscp` in your bench_start() is redundant due to the previous cpuid call. You save a byte by just calling `rdtsc`, which is the way recommended in that awesome Intel whitepaper – Gavin Portwood Jul 04 '17 at 05:29
  • **The second inline asm for `rdtscp` is unsafe. It clobbers ECX without telling the compiler.** Use a clobber, or better use the intrinsic. [Get CPU cycle count?](//stackoverflow.com/a/51907627). My answer on that Q&A also has some links to serializing before/after `rdtsc` with `lfence`. – Peter Cordes Aug 18 '18 at 15:27
  • Possible duplicate of [Get CPU cycle count?](https://stackoverflow.com/questions/13772567/get-cpu-cycle-count) – Peter Cordes Aug 18 '18 at 15:28

0 Answers0