0

I'm trying to get into C optimization. When running perf, the reports don't really make sense to me.

I created a test program:

int main()
{
    return 0;
}

Compiled it: gcc test.c -o test -std=c99 -O2 -lm

And ran perf:

perf stat -B -r 20 -e "cycles,instructions,cache-references,cache-misses,branches,branch-misses,cpu-clock,task-clock,faults,cs,migrations,alignment-faults" test

This is the output:

Performance counter stats for 'test' (20 runs):

918.130 cycles # 1,640 GHz ( +- 0,75% )
871.395 instructions # 0,95 insn per cycle ( +- 0,31% )
35.793 cache-references # 63,926 M/sec ( +- 0,90% )
7.897 cache-misses # 22,062 % of all cache refs ( +- 3,81% )
176.129 branches # 314,562 M/sec ( +- 0,26% )
7.300 branch-misses # 4,14% of all branches ( +- 1,04% )
0,56 msec cpu-clock # 0,648 CPUs utilized ( +- 3,87% )
0,56 msec task-clock # 0,648 CPUs utilized ( +- 3,87% )
59 faults # 0,106 M/sec ( +- 0,35% )
0 cs # 0,000 K/sec
0 migrations # 0,000 K/sec
0 alignment-faults # 0,000 K/sec

0,0008638 +- 0,0000357 seconds time elapsed ( +- 4,13% )

I'm not sure if I'm missing something, but I can't find any reason it would make sense for a program that returns 0 to have 871 thousand instructions, 7 thousand cache misses and 176 thousand branches.

Am I doing something wrong when running perf? Or just completely misunderstanding what the output is supposed to mean?

aturan23
  • 4,798
  • 4
  • 28
  • 52
  • 3
    You're probably just reporting the performance of the startup code in the C runtime library. – Barmar Apr 06 '20 at 04:46
  • I was thinking about that, but when trying to find an answer I came across this question https://stackoverflow.com/questions/23605341/i-dont-understand-cache-miss-count-between-cachegrind-vs-perf-tool with a much more complex program and fewer cache misses. perf seems to have been ran the same way. Could it be inconsistent across systems? And is it possible to profile only my code? – Vinicius Lambardozzi Apr 06 '20 at 04:56
  • There is `--time` option in [`perf report`](http://man7.org/linux/man-pages/man1/perf-report.1.html). To limit profiling to part of the program (to skip the libc and other libs startup codes which are inconsistent across systems like versions of ubuntu or debian or rhel), I suggest to add "`sleep(5)`" call before your code and use `perf report --time start,end` where start and end times can be manually parsed out of `perf script` output. (Any library linked to your application can register "constructor" functions to run before your `main()` to init some parts like locales.) – osgx Apr 09 '20 at 23:30
  • And for perf stat use `perf stat -I 1000` after adding `sleep(5)` before your code to separate libraries init work from your code work. This option will ask `perf stat` to print statistics for every second. Or use `perf stat --delay 100` with added `sleep(5)` before your code, it will ask perf stat not to count over first 100 milliseconds. Not sure will it work or not. – osgx Apr 09 '20 at 23:38

0 Answers0