To be able to profile application runtimes whose binaries will actually be run under a simulator (NS-3/DCE). I wanted to use the linux performance counters, I expected the instruction count for an application which has no source of non-determinism to be deterministic. I couldn't be more wrong according to the linux performance counters, let's take a simple example:
$ (perf stat -c -- sleep 1 2>&1 && perf stat -c -- sleep 1 2>&1) |grep instructions
669218 instructions # 0,61 insns per cycle
682286 instructions # 0,58 insns per cycle
1) What is the source of this non-determinism? Does this stem from the low-level branch-prediction and other engines in the CPU.
2) Other question, is there a way to know the amount of instructions fed to the CPU (in contrast to the amount of instructions in the example output), in order to do get the amount of executed code in a deterministic way?