0

I've run some sample code through perf, and get the following top hotspot:

-   63.93%     0.10%  bonds    libc.so.6              [.] __GI___fstatat64                                                                                                                     
     63.82% __GI___fstatat64                                                                                                                                                                   
-   63.82%    63.82%  bonds    [unknown]              [k] 0xffffffffa90001a2                                                                                                                   
     __GI___fstatat64                                                                                                                                                                          
+    5.32%     0.00%  bonds    [unknown]              [.] 0000000000000000

From this, I can see that the majority of the time is spent in the __GI___fstatat64 function inside glibc. I'm now using gdb to debug this code to see what code paths lead to this area of the code. As you can see, perf only shows me a very shallow call stack, i.e one function, where as gdb gives me the full backtrace:

(gdb) bt
#0  __GI___fstatat64 (fd=fd@entry=-100, file=file@entry=0x7ffff59bb46b "/etc/localtime", buf=buf@entry=0x7fffffffd440, flag=flag@entry=0)
    at ../sysdeps/unix/sysv/linux/fstatat64.c:154
#1  0x00007ffff591d706 in __GI___stat64 (file=file@entry=0x7ffff59bb46b "/etc/localtime", buf=buf@entry=0x7fffffffd440)
    at ../sysdeps/unix/sysv/linux/stat64.c:29
#2  0x00007ffff58eb174 in __tzfile_read (file=file@entry=0x7ffff59bb46b "/etc/localtime", extra=extra@entry=0, extrap=extrap@entry=0x0) at tzfile.c:159
#3  0x00007ffff58ead24 in tzset_internal (always=<optimized out>) at tzset.c:405
#4  0x00007ffff58eaf23 in __tz_convert (timer=1681057207, use_localtime=1, tp=0x7ffff59fd660 <_tmbuf>) at tzset.c:577
#5  0x00007ffff6da62ae in QuantLib::Date::todaysDate () at /mnt/hdd1/sandbox/temp/QuantLib/ql/time/date.cpp:770
#6  0x00007ffff67e4a35 in QuantLib::Settings::DateProxy::operator QuantLib::Date (
    this=0x7ffff7e238a0 <QuantLib::Singleton<QuantLib::Settings, std::integral_constant<bool, false> >::instance()::instance>)
    at /mnt/hdd1/sandbox/temp/QuantLib/ql/settings.hpp:136
#7  QuantLib::Bond::isExpired (this=0x7fffffffda00) at /mnt/hdd1/sandbox/temp/QuantLib/ql/instruments/bond.cpp:107
#8  0x000000000040eb9c in QuantLib::Instrument::calculate (this=0x7fffffffda00) at /usr/local/include/ql/instrument.hpp:148
#9  0x00000000004103fa in QuantLib::Instrument::NPV (this=this@entry=0x7fffffffda00) at /usr/local/include/ql/instrument.hpp:185
#10 0x000000000040e272 in go () at /mnt/hdd1/sandbox/work/cpp/quantlib/bonds.cpp:93

Show I not be able to see an equally rich call stack in perf? I'm invoking perf like so with code compiled with -Og -ggdb3:

perf record -g ./bonds
perf report -g
Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
imrichardcole
  • 4,633
  • 3
  • 23
  • 45
  • According to the man page, `perf record --call-graph` defaults to `fp` (frame pointer) (not `dwarf` or `lbr`), but you maybe didn't compile with `-fno-omit-frame-pointer`. Unless `-Og` doesn't disable frame-pointers? It's not clear from the man page what exactly `-g` does; `--call-graph` says it implies `-g` not that it's a synonym. – Peter Cordes Apr 09 '23 at 18:35
  • Relevant? https://stackoverflow.com/questions/60491534/weird-backtrace-in-perf – Paul Sanders Apr 09 '23 at 19:32

2 Answers2

1

Working through the provided comments to show how I arrived at a solution that gave me what I needed.

Adding -fno-omit-frame-pointer compiler flag

Now when I run perf record using the following command - perf record -F 99 -g ./bonds it certainly gives more detail, but still not really the full call stack I was looking for.

Adding --call-graph dwarf to record stage

This binary didn't have -fno-omit-frame-pointer flag set, but using the command - perf record -F 99 --call-graph dwarf ./bonds gives me pretty much exactly what I was looking for and very closely matches what I'm seeing in gdb

So in the context of my question, adding --call-graph dwarf to the record command appears to be the correct solution:

-   65.66%    65.66%  bonds    [unknown]              [k] 0xffffffffa90001a2                                                                                                                                                                                       ▒
     _start                                                                                                                                                                                                                                                        ◆
     __libc_start_main_alias_2 (inlined)                                                                                                                                                                                                                           ▒
     __libc_start_call_main                                                                                                                                                                                                                                        ▒
     main                                                                                                                                                                                                                                                          ▒
   - go                                                                                                                                                                                                                                                            ▒
      - 31.64% QuantLib::Bond::cleanPrice                                                                                                                                                                                                                          ▒
         - 16.64% QuantLib::Bond::settlementDate                                                                                                                                                                                                                   ▒
              QuantLib::Settings::DateProxy::operator QuantLib::Date (inlined)                                                                                                                                                                                     ▒
              QuantLib::Date::todaysDate                                                                                                                                                                                                                           ▒
              __tz_convert                                                                                                                                                                                                                                         ▒
              tzset_internal                                                                                                                                                                                                                                       ▒
              __tzfile_read                                                                                                                                                                                                                                        ▒
              __fstatat (inlined) 
imrichardcole
  • 4,633
  • 3
  • 23
  • 45
-1

You cannot see a full call stack with perf, it just doesn't work in this way.
perf is a sampling profiler. Roughly speaking, it wakes up a number of times per second and records which function is being executed right now. Basing on that, perf report shows you statistics. Thus, functions that are called rarely (or executed too fast) just don't fall into it.