How to profile my C++ application on linux

Question

I would like to profile my c++ application on linux. I would like to find out how much time my application spent on CPU processing vs time spent on block by IO/being idle.

I know there is a profile tool call valgrind on linux. But it breaks down time spent on each method, and it does not give me an overall picture of how much time spent on CPU processing vs idle? Or is there a way to do that with valgrind.

Let say 'time' tell me my application takes 20 sec. How does valgrind breakdown how much time I spend on CPU processing VS how much time in that 20 sec I am idle? I understand valgrind break down the cost of each function when CPU is processing. I want to find out the ratio between CPU processing time VS idle time (wait for network traffic, IO calls, etc). — richard, May 12 '10 at 20:29

florin · Answer 1 · 2010-05-12T21:11:42.313

6

Check out oprofile. Also for more system-level diagnostics, try systemtap.

edited May 12 '10 at 21:11

answered May 12 '10 at 20:02

florin

13,986
6
46
47

The problem win OProfile is it only measures cpu time. Time blocked on IO or system calls won't show up in its reports. – deft_code May 12 '10 at 20:08
@Caspin: you can deduct the io-time from wall-clock time. – florin May 12 '10 at 21:10

Kaleb Pederson · Accepted Answer · 2010-05-12T20:18:57.377

3

I can recommend valgrind's callgrind tool in conjunction with KCacheGrind for visualization. KCacheGrind makes it pretty easy to see where the hotspots are.

Note: It's been too long since I used it, so I'm not sure if you'll be able to get I/O Wait time out of that. Perhaps in conjunction with iostat or pidstat you'll be able to see where all the time was spent.

edited May 12 '10 at 20:18

answered May 12 '10 at 20:06

Kaleb Pederson

45,767
19
102
147

Callgrind only records system time, not idle time. – Šimon Tóth May 12 '10 at 20:40

score 3 · Answer 3 · answered May 12 '10 at 20:33

3

LTTng is a good tool to use for full system profiling.

answered May 12 '10 at 20:33

Yann Ramin

32,895
3
59
82

score 3 · Answer 4 · answered May 12 '10 at 21:06

3

You might want to check out Zoom, which is a lot more polished and full-featured than oprofile et al. It costs money ($199), but you can get a free 30 day evaluation licence.

answered May 12 '10 at 21:06

Paul R

208,748
37
389
560

Andrew G · Answer 5 · 2010-05-12T20:50:41.277

2

callgrind is a very good tool but I found OProfile to me more 'complete'. Also, it is the only one that lets you specify module and/or kernel source to allow deeper insight into your bottlenecks. The output is supposed to be able to interface with KCacheGrind but I had trouble with that so I used Gprof2Dot instead. You can export your callgraph to a .png.

Edit:

OProfile looks at the overall system so the process will just be:

[setup oprofile]

opcontrol --init
opcontorl --vmlinux=/path/to/vmlinux     (or --no-vmlinux)
opcontrol --start

[run your app here]

opcontrol --stop   (or opcontrol --shutdown [man for difference]

then to start looking at the results look at the man page on opreport

edited May 12 '10 at 20:50

answered May 12 '10 at 20:19

Andrew G

1,547
1
13
27

Do I need to compile my program with special flags for OProfile to work? – richard May 12 '10 at 20:35
what is 'vmlinux'? where can I find it? – richard May 13 '10 at 00:16

timday · Answer 6 · 2010-05-12T22:44:57.560

If your app simply runs "flat out" (ie it's either using CPU or waiting for I/O) until it exits, and there aren't other processes competing, just do time myapp (or maybe /usr/bin/time myapp, which produces slightly different output to the shell builtin).

This will get you something like:

real    0m1.412s
user    0m1.288s
sys     0m0.056s

In this case, user+sys (kernel) time account for almost all the real time and there's just 0.068s unaccounted for... (probably time spent initally loading the app and its supporting libs).

However, if you were to see:

real    0m5.732s
user    0m1.144s
sys     0m0.078s

then your app spent 4.51s not consuming CPU and presumably blocked on IO. Which is the information I think you're looking for.

However, where this simple analysis technique breaks down is:

Apps which wait on a timer/clock or other external stimulus (e.g event-driven GUI apps). It can't distinguish time waiting on the clock and time waiting on disk/network.
Multithreaded apps, which need a bit more thinking about to interpret the numbers.

Well I think that I'm searching for the same tool, but I must say this isn't very informative post. The problem is to find areas of code, that are (for some now unknown reason) waiting for something, determine the reasons for the waiting and try to eliminate it. For example I have a three part network software, I need to improve the performance, but even with extreme workload the system is spending most of the time waiting. — Šimon Tóth, May 13 '10 at 09:42

score 0 · Answer 7 · answered May 12 '10 at 20:03

0

The lackey and/or helgrind tools in valgrind should allow you to do this.

answered May 12 '10 at 20:03

wash

497
4
7

score 0 · Answer 8 · answered Jun 15 '11 at 07:58

0

google-perf-tools - much faster alternative to callgrind (and it can generate output with the same format as callgrind, so you can use KCacheGrind).

answered Jun 15 '11 at 07:58

frp

1,119
1
10
30

score -1 · Answer 9 · edited May 23 '17 at 11:50

See this post.

And this post.

Basically, between the time the program starts and when it finishes, it has a call stack. During I/O, the stack terminates in a system call. During computation, it terminates in a typical instruction.

Either way, if you can sample the stack at random wall-clock times, you can see exactly why it's spending that time.

The only remaining point is - thousands of samples might give a sense of confidence, but they won't tell you much more than 10 or 20 samples will.

How to profile my C++ application on linux

9 Answers9

Linked