0

I want to write a function to get the current timestamp. Because the __retscp(&addr) function is used directly, one parameter must be input every time. I want to write a function cutTime() in a .h file that returns the current timestamp each time. Since I don't want to lose time because of the function call process, I define it as an inline function. One uses rdtsc and the other uses rdtscp. Are my two implementations the same as using __retscp(&addr) directly?

1.static inline uint64_t curTime() {
  uint64_t a, d;
  asm volatile ("mfence");
  asm volatile("rdtsc" : "=a"(a), "=d"(d) :: "rcx");
  a = (d<<32) | a;
  asm volatile ("mfence");
  return a;
}
2.static inline uint64_t curTime() {
  uint64_t a;
  asm volatile ("rdtscp" : "=a" (a));
  return a;
}

3.__rdtscp( & junk)
Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
Gerrie
  • 736
  • 3
  • 18
  • 1
    Your 2nd function is obviously unsafe: you didn't tell the compiler about clobbering RDX and RCX. Your first function uses `rdtsc` (not p) so the asm statement doesn't touch RCX. But anyway, look at how these inline into real callers, e.g. on https://godbolt.org/. Also, `mfence` is very slow and not useful here (unless you want to wait for stores to complete), you want `lfence`. – Peter Cordes Nov 10 '20 at 08:30
  • last word is junk for a reason. really idk – Алексей Неудачин Nov 10 '20 at 08:41
  • Does this answer your question? [How to count clock cycles with RDTSC in GCC x86?](https://stackoverflow.com/questions/9887839/how-to-count-clock-cycles-with-rdtsc-in-gcc-x86) – vgru Nov 10 '20 at 09:31
  • 1
    @Groo: It's not quite *that* simple. An `lfence` *after* `rdtsc[p]` can help reduce measurement noise in some cases. And you usually want `lfence` before rdtsc, or use `rdtscp`, if you're timing something: letting the clock-read execute out-of-order while work is still in flight might not be what you want. Also, there can be tiny advantages to ignoring the high half of the RDTSC[P] result via inline asm, if compilers miss optimizing away some of the shift/or when you just do `uint32_t` subtraction. – Peter Cordes Nov 10 '20 at 09:43
  • [Is there any difference in between (rdtsc + lfence + rdtsc) and (rdtsc + rdtscp) in measuring execution time?](https://stackoverflow.com/q/59759596). This answer on [clflush to invalidate cache line via C function](https://stackoverflow.com/a/51830976) has some discussion in comments about lfence after rdtsc at the bottom of the timed region being apparently helpful. – Peter Cordes Nov 10 '20 at 09:44
  • @Peter Cordes In the second case, only rax is used. And just read its data. Why tell the compiler about clobbering RDX and RCX? – Gerrie Nov 10 '20 at 11:39
  • @cyj: Because [RDTSCP](https://www.felixcloutier.com/x86/rdtscp) modifies those registers, whether you want it or not!!! The compiler needs to know what registers are affected by an `asm` statement, otherwise it will assume they're unmodified and might be keeping something important in one of them. Never lie to your compiler; it causes hard-to-debug problems that appear to happen in other code. If you don't 100% understand this, stay *far* away from inline asm. Even if you *do* understand it, https://gcc.gnu.org/wiki/DontUseInlineAsm. (You can use `_mm_lfence()` and `__rdtsc()` just fine.) – Peter Cordes Nov 10 '20 at 11:43

0 Answers0