How to get call graph profiling working with gcc compiled code and ARM Cortex A8 target?

Question

I am biting my teeth out on this one...

I need to do profiling on an ARM board and need to view call graphs. I tried with OProfile, Kernel perf and Google performance tools. All work fine but do not output any call-graph information.

This led me to the conclusion that I am not compiling my code correctly.

I use the following flags when compiling my C++ code:

Arch specific:

-march=armv7-a -mtune=cortex-a8 -mfloat-abi=hard -mfpu=vfpv3

General:

-fexceptions -fno-strict-aliasing -D_REENTRANT -Wall -Wextra

Debugging (with optimization):

-O2 -g -fno-omit-frame-pointer

I did a lot of Google searching and found some related topics:

libunwind ?
dwarf
(asynchronous-)unwind-tables
-mapcs-frame

However I do not fully understand how these are all connected. Any hints on how to get call graphs working?

Note (due to Rian's answer): I am interested in finding out if and why some methods take longer (in relation to others) on ARM than x86-64. It does not help to do this on a different platform (Even though my code compiles on both and I can do call-graphs on x86-64).

Are you sure you want to compile with -mfloat-abi=hard? From what I understand, mfloat-abi=softfp still uses NEON but is more compatible with existing binaries, though it's not quite as performant as mfloat-abi=hard https://wiki.linaro.org/Linaro-arm-hardfloat — Rian Sanderson, Nov 30 '11 at 22:38
hard should improve performance and we build our whole distribution with hard. — Hannah S., Dec 05 '11 at 10:07
oprofile depends on the kernel. were you reconfigured your kernel to facilitate profiling? — accuya, Mar 13 '12 at 09:45
Did you finally managed to get it working ? I'm stuck with the exact same problem. — Simon, Jan 10 '14 at 12:01
No - at the time we gave up and resorted to trial and error :( — Hannah S., Nov 30 '15 at 11:41

score 2 · Answer 1 · answered Nov 30 '11 at 17:32

2

I know you want to do your profiling on an ARM cortex-A8 but if you're interested in call-graphs, why not compile for x86 and run valgrind's callgrind tool and examine the results with kcachegrind?

The call graphs should be the same between the two architectures, even if they compile the functions slightly differently, the relationship between functions shouldn't change.

No special flags needed:

valgrind --tool=callgrind -v --dump-every-bb=10000000 ./some-app
kcachegrind &

answered Nov 30 '11 at 17:32

Rian Sanderson

6,306
4
29
34

1

I am especially interested in the cumulated time per function (including subcalls), so I can see how some usage patterns might affect performance specific to platform. This is hard to transfer from one platform to the other since the "per invocation chain" relations will differ. But your answer is good advice for somebody who just wants to get call graphs without being interested in the timings. +1 – Hannah S. Nov 30 '11 at 19:15

How to get call graph profiling working with gcc compiled code and ARM Cortex A8 target?

1 Answers1

Linked