2

I'd like to know how'd you measure the amount of clock cycles per instruction say copy int from one place to another?

I know you can time it down to nano seconds but with today's cpu's that resolution is too low to get a correct reading for the oprations that take just a few clock cycles?

It there a way to confirm how many clock cycles per instructions like adding and subing it takes in python? if so how?

Pongo
  • 81
  • 1
  • 5
  • It's not possible. Python code is too far away from CPU operations. Why would you want to know? – PM 77-1 Mar 03 '22 at 16:21
  • Well in order to create an efficient code it's useful to know the weight of your fundamental multipliers like move variable declare an array all the math ops. also helps to see if the cpu has special functionality for special case executions. – Pongo Mar 03 '22 at 16:26
  • 2
    Not really. Unless you are writing an embedded system for a specific hardware - you will not be able to predict what CPU your code will run on. Premature optimization is not really a good idea. – PM 77-1 Mar 03 '22 at 16:29
  • 1
    @PM77-1 is definetly correct, I might add that if you are actually looking at some performance issues and want to optimize I would start with [profiling](https://docs.python.org/3/library/profile.html) your python code and looking at that output with tools like [snakeviz](https://jiffyclub.github.io/snakeviz/) – Matteo Zanoni Mar 03 '22 at 16:34
  • I am actually an embedded programmer it comes as aroutine but I'd still be nice to know how many clock cycle does it take to perform a say numpy operation on elements in an array in ints and floats etc... I'll take a deeper look into @MatteoZanoni snakeviz and profiling after I completer a few more python projects to see if it gives relevant enough data about execution performance thanks for that. – Pongo Mar 03 '22 at 19:21

1 Answers1

1

This is a very interesting question that can easily throw you into the rabbit's hole. Basically any CPU cycle measurements depends on your processors and compilers RDTSC implementation.

For python there is a package called hwcounter that can be used as follows:

# pip install hwcounter 

from hwcounter import Timer, count, count_end
from time import sleep

# Method-1
start = count()
# Do something here:
sleep(1)
elapsed = count_end() - start
print(f'Elapsed cycles: {elapsed:,}')

# Method-2
with Timer() as t:
    # Do something here:
    sleep(1)
print(f'Elapsed cycles: {t.cycles:,}')

NOTE: It seem that the hwcounter implementation is currently broken for Windows python builds. A working alternative is to build the pip package using the mingw compiler, instead of MS VS.


Caveats

Using this method, always depend on how your computer is scheduling tasks and threads among its processors. Ideally you'd need to:

  • bind the test code to one unused processor (aka. processor affinity)
  • Run the tests over 1k - 1M times to get a good average.
  • Need a good understanding of not only compilers, but also how python optimize its code internally. Many things are not at all obvious, especially if you come from C/C++/C# background.

Rabbit Hole:

not2qubit
  • 14,531
  • 8
  • 95
  • 135