This is a very interesting question that can easily throw you into the rabbit's hole. Basically any CPU cycle measurements depends on your processors and compilers RDTSC implementation.
For python there is a package called hwcounter that can be used as follows:
# pip install hwcounter
from hwcounter import Timer, count, count_end
from time import sleep
# Method-1
start = count()
# Do something here:
sleep(1)
elapsed = count_end() - start
print(f'Elapsed cycles: {elapsed:,}')
# Method-2
with Timer() as t:
# Do something here:
sleep(1)
print(f'Elapsed cycles: {t.cycles:,}')
NOTE:
It seem that the hwcounter implementation is currently broken for Windows python builds. A working alternative is to build the pip package using the mingw compiler, instead of MS VS.
Caveats
Using this method, always depend on how your computer is scheduling tasks and threads among its processors. Ideally you'd need to:
- bind the test code to one unused processor (aka. processor affinity)
- Run the tests over 1k - 1M times to get a good average.
- Need a good understanding of not only compilers, but also how python optimize its code internally. Many things are not at all obvious, especially if you come from C/C++/C# background.
Rabbit Hole: