C and C++ source code profiling tools

Question

Possible Duplicate:
What's your favorite profiling tool (for C++)

Are there any good tools to profile a source code which is mix of of C and C++. What are the pros and cons of any, and which ones have you used and would reccomend for usage. Please do not get me a list of tools from google. I can do that too, what i want is to leverage the personal experience of someone who has used these tools and knows the pros and cons about them.
Thanks in advance.

Duplicate: http://stackoverflow.com/questions/26663/whats-your-favorite-profiling-tool-for-c — Will Bickford, Nov 10 '10 at 04:51
@Will:Isaw that thread already but it was from the year 2008, any good new additions? Also, my code is mix of C and C++ not just C or C++ — Alok Save, Nov 10 '10 at 04:57
Personally I don't know, just adding to the web of links for SO. Sorry :) — Will Bickford, Nov 10 '10 at 05:05
@Ben Voight: looking out for Linux and Windows as target platforms. — Alok Save, Nov 11 '10 at 04:08

score 36 · Accepted Answer · edited May 23 '17 at 12:02

I've found gprof to be the best CPU hotspot profiler, and Google Performance Tools to be the best sampling profiler. Both work for C and C++.

In my opinion there are no good profiling tools on Windows.

GNU gprof pros and cons

GCC only
Works with C and C++
Only treats CPU time, and code inside the binary, you need everything you wish to profile statically linked in
Very accurate
Adds a small overhead to execution

Google Performance Tools pros and cons

I think it requires the GNU tool chain
Occasionally fails to identify symbols
Very customizable
Outputs to a huge variety of formats, including the Callgrind format, and automatically loads KCacheGrind for you
Has various memory profiling tools also
Is a sampling profiler, with minimal overhead

Related useful questions and answers

I'll add that one of those formats Google Performance Tools outputs to is to Callgrind, KCacheGrind is called automatically. — Matt Joiner, Nov 11 '10 at 01:08

score 5 · Answer 2 · edited May 23 '17 at 10:31

I would respectfully disagree with Matt.

The tool I use all the time on Windows is the random-pausing technique, and it works with all languages that the IDE supports.

As an example of using it to do performance tuning, this case shows how a speedup of 43 times was achieved through a series of steps.

Gprof has a lot of problems, listed here, and according to the google-perftools manual, some of the same issues are repeated there, such as reporting procedures, not lines, emphasizing self (local) time, emphasizing the graph, etc. (I can't tell from the doc if it samples while blocked.)

As software systems become ever larger, self time becomes less and less relevant. The program counter spends most of its time in library routines or blocked in the system. Graphs become gigantic nests. People ask "I know function X is costly, but where in function X is the problem?" What's more, the "bottlenecks" get bigger and bigger, because the stack gets deeper on average, and every layer of the stack is a fresh opportunity to do more function calls than necessary.

An example of a stack-sampler that reports percent by line, and samples while blocked, and allows user control of sampling so as not to dilute the sample set during user input, is Zoom.

EDIT: Sorry, can't leave well enough alone. Here's a new explanation:

The way programs work, they trace out a call tree, which is a lot like the oak tree outside my window. It has a trunk (main) which sprouts branches (call sites) which sprout further branches for several levels out to leaves (instructions) and acorns (blocking calls).

When the tree surgeon comes to prune (optimize) it, does he look only where the leaves are (hotspots)? Does he ignore acorns (no samples during blocking)? No, he looks for branches (call sites) that are both heavy (on the stack a lot) and unhealthy (unnecessary). Those are what he prunes. That's what random-pausing and Zoom do, is help find those call sites.

@Matt: (I suspect you're the downvote.) You can recommend it, but as you can see, 1) it gives you % by function, not line, 2) it gives self time, which is almost meaningless, 3) it appears to not sample when blocked, making it blind to I/O (vsnprintf does not do I/O), 4) there seems to be no real-time user control of sampling, so in interactive programs you can't make it sample during the phase you care about. These issues can be ignored until there's a real problem to solve. Then they make all the difference. — Mike Dunlavey, Nov 11 '10 at 01:54
@Matt: I know this disagrees with conventional wisdom, but that's what I like about conventional wisdom. You may find treasure when you look behind it. — Mike Dunlavey, Nov 11 '10 at 01:56

score 3 · Answer 3 · answered Nov 10 '10 at 06:31

3

You can use Callgrind to create profiling output. It is part of Valgrind. Callgrind-output could be used with KCacheGrind, which is probably worth a look as long as you're using Linux.

answered Nov 10 '10 at 06:31

MOnsDaR

8,401
8
49
70

score 2 · Answer 4 · answered Nov 11 '10 at 02:50

2

AMD CodeAnalyst is pretty nice. It's also cross platform which is nice when one finds a platform specific bottleneck.

answered Nov 11 '10 at 02:50

Vitor Py

5,145
4
39
62

C and C++ source code profiling tools

4 Answers4

GNU gprof pros and cons

Google Performance Tools pros and cons

Related useful questions and answers

Linked