I have a data processing algorithm implemented in Python 2.7, and I need to move it on an embedded system (let it be microcontroller or a more advanced board). To choose the hardware I must know how many floating-point operations are performed and how much memory is used in total.
How to determine these efficiently?