A total time is much easier to measure accurately / reliably, and the measurement overhead is irrelevant. It's what I'd recommend, as long as you're sure you can stop your compiler from optimizing across iterations of whatever you're measuring. (Check the generated asm if necessary).
If you think your runtime might be data-dependent and want to look into variation across iterations, then you might consider recording timestamps somehow. But 300 ns is only ~1k clock cycles on a 3.3GHz CPU, and recording a timestamp takes some time. So you definitely need to worry about measurement overhead.
Assuming you're on x86, raw rdtsc
around each operation is pretty lightweight, but out-of-order execution can reorder the timestamps with the work. Get CPU cycle count?, and clflush to invalidate cache line via C function.
An lfence; rdtsc; lfence
to stop the timing from reordering with each iteration of the workload will block out-of-order execution of the steps of the workload, distorting things. (The out-of-order execution window on Skylake is a ROB size of 224 uops. At 4 per clock that's a small fraction of 1k clock cycles, but in lower-throughput code with stalls for cache misses there could be significant overlap between independent iterations.)
Any standard timing functions like C++ std::chrono
will normally call library functions that ultimately use rdtsc
, but with many extra instructions. Or worse, will make an actual system call taking well over a hundred clock cycles to enter/leave the kernel, and more with Meltdown+Spectre mitigation enabled.
However, one thing that might work is using Intel-PT (https://software.intel.com/en-us/blogs/2013/09/18/processor-tracing) to record timestamps on taken branches. Without blocking out-of-order exec at all, you can still get timestamps on when the loop branch in your repeat loop executed. This may well be independent of your workload and able to run soon after its issued into the out-of-order part of the core, but that can only happen a limited distance ahead of the oldest not-yet-retired instruction.