I want to profile my c++ program with linux perf. For this I used the three following commands and I do not understand why I get three completely different reports.
perf record --call-graph dwarf ./myProg
perf report
perf record --call-graph fp ./myProg
perf report
perf record --call-graph lbr ./myProg
perf report
Also I do not understand why the main
function is not the highest function in the list.
The logic of my program is the following, the main
function calls the getPogDocumentFromFile
function which calls fromPoxml
which calls toPred
which calls applySubst
which calls subst
. Moreover toPred
, applySubst
and subst
are recursive functions. And I expect them to be the bottleneck.
Some more comments: my program runs about 25 minutes, it is highly recursive and allocates a lot (~17Go) of memory. Also I compile with -fno-omit-frame-pointer
and use a recent intel CPU.
Any Idea?
EDIT:
Thinking again about my question, I realize that I do not understand the meaning of the Children column.
So far I assumed that the Self column was the percentage of samples with the function we are looking at at the top of the call stack and the Children column was the percentage of samples with the function anywhere in the call stack. Obviously this is not the case, otherwise the main function would have its children column not far from 100%. Maybe the callstack is truncated? Or am I completely misunderstanding how profilers work?