I'm trying to get into C optimization. When running perf, the reports don't really make sense to me.
I created a test program:
int main()
{
return 0;
}
Compiled it: gcc test.c -o test -std=c99 -O2 -lm
And ran perf:
perf stat -B -r 20 -e "cycles,instructions,cache-references,cache-misses,branches,branch-misses,cpu-clock,task-clock,faults,cs,migrations,alignment-faults" test
This is the output:
Performance counter stats for 'test' (20 runs):
918.130 cycles # 1,640 GHz ( +- 0,75% )
871.395 instructions # 0,95 insn per cycle ( +- 0,31% )
35.793 cache-references # 63,926 M/sec ( +- 0,90% )
7.897 cache-misses # 22,062 % of all cache refs ( +- 3,81% )
176.129 branches # 314,562 M/sec ( +- 0,26% )
7.300 branch-misses # 4,14% of all branches ( +- 1,04% )
0,56 msec cpu-clock # 0,648 CPUs utilized ( +- 3,87% )
0,56 msec task-clock # 0,648 CPUs utilized ( +- 3,87% )
59 faults # 0,106 M/sec ( +- 0,35% )
0 cs # 0,000 K/sec
0 migrations # 0,000 K/sec
0 alignment-faults # 0,000 K/sec
0,0008638 +- 0,0000357 seconds time elapsed ( +- 4,13% )
I'm not sure if I'm missing something, but I can't find any reason it would make sense for a program that returns 0 to have 871 thousand instructions, 7 thousand cache misses and 176 thousand branches.
Am I doing something wrong when running perf? Or just completely misunderstanding what the output is supposed to mean?