What are the drawbacks to using rdtsc to do benchmarking?

Question

I'm trying to design a benchmark for a low-latency workload (each operation is in the hundreds of nanos median). I was curious about the fidelity of designing a benchmark using rdtsc() and timestamp counter measurement.

What are the drawbacks or potential problems with this approach?

possible duplicate: my canonical answer on [Get CPU cycle count?](https://stackoverflow.com/a/51907627) covers a lot of the quirks and difficulties of RDTSC. — Peter Cordes, Oct 26 '18 at 15:41
It measures elapsed wall-clock time. Maybe that's what you want but that doesn't correlate well with how much work the processor actually did. You'll get results that look too good on small benchmarks. You can only get that measurement from the processor counters. — Hans Passant, Oct 26 '18 at 15:42

score 1 · Answer 1 · answered Oct 26 '18 at 15:44

1

The most serious drawback of rdtsc is that it is very hardware specific. Even on hardware that supports this instruction, it might not be steadily increasing, un-synchronized between different cores and could be affected by CPU state.

Generally, if your CPU supports constant_tsc, nonstop_tsc and tsc_known_freq, using it for latency measurements should be very productive.

answered Oct 26 '18 at 15:44

SergeyA

61,605
5
78
137

All recent hardware *does* have those features, though. I think you'd need something older than Nehalem or Core 2 for it to be a problem, other than de-sync between cores. (And you solve that by pinning threads to cores for microbenchmarking, even if you don't do that in your real application.) – Peter Cordes Oct 26 '18 at 15:47
@PeterCordes I do not not disagree with that, but absent any specific hardware tags, I had to make as generic answer as possible. – SergeyA Oct 26 '18 at 18:48

What are the drawbacks to using rdtsc to do benchmarking?

1 Answers1