0

I'm profiling my app on x86_64 linux, building with gcc (-O2) and using perf record. My app runs in 1.5 seconds, Using perf record -F max ./myapp myargs I've been getting false numbers. I want perf to give me a tree so I can figure out where to start looking. Using pref record -F max --call-graph fp ./myapp myargs only few functions give me an inner and look like a tree

I was wondering what build options I need to get a proper tree? Do I need to remove static from my functions (I use that when it doesn't need to be visible across files). I happen to know one specific function is > 10% of my runtime (I checked via clock_gettime and rdtscp only adding when the CPU matches the start cpu). The complex function is reported as 4% of my CPU time (which is lower than reality) and it's parent function (the only function that can call it) says children is .2 and self is .19 which isn't true

The function I timed was obvious but I wouldn't know what other functions may be slow. Is 1.5 too short of a run? Do I need to manually insert timing code when runs are that short?

Stan
  • 161
  • 8
  • 1
    `--call-graph fp` would require `gcc -O2 -fno-omit-frame-pointer`. See [How can you get frame-pointer perf call stacks/flamegraphs involving the C++ standard library?](https://stackoverflow.com/q/68259699) - library code compiled without that option might not be part of the tree. – Peter Cordes Dec 07 '22 at 19:04
  • @PeterCordes That solved it ty. It looks like all my performance are on virtual calls and if statements. I guess my input data isn't predictable. – Stan Dec 07 '22 at 19:27

0 Answers0