I am trying to profile a C++ program. For the first step, I want to determine whether the program is compute-bound or memory-bound by the Roofline Model. So I need to measure the following 4 things.
- W: # of computations performed in the program (FLOPs)
- Q: # of bytes of memory accesses incurred in the program (Byte/s)
- π: peak performance (FLOPs)
- β: peak bandwidth (Byte/s)
I have tried to use Linux perf to measure W. I followed the instructions here, using libpfm4
to determine the available events (by ./showevinfo
). I found my CPU supports the INST_RETIRED
event with umask X87
, then I used ./check_events INST_RETIRED:X87
to find the code, which is 0x5302c0
. Then I tried perf stat -e r5302c0 ./test_exe
and I got
Performance counter stats for './test_exe':
83,381,997 r5302c0
20.134717382 seconds time elapsed
74.691675000 seconds user
0.357003000 seconds sys
Questions:
- Is it right for my process to measure the W of my program? If yes, then it should be 83,381,997 FLOPs, right?
- Why is this FLOPs not stable between repeated executions?
- How can I measure the other Q, π and β?
Thanks for your time and any suggestions.