What are your favorite features of a C/C++ performance profiler / analyzer?

Question

I'm trying to pick a perforamnce analyzer to use. I'm a beginner developer and not sure what to look for in a performance analyzer. What are the most important features?

score 4 · Answer 1 · answered Dec 04 '08 at 19:33

4

If you use valgrind, I can highly recommend KCacheGrind to visualize performance bottlenecks.

answered Dec 04 '08 at 19:33

Konrad Rudolph

530,221
131
937
1,214

The proportional call graph in KCacheGrind is great! See http://kcachegrind.sourceforge.net/html/pics/KcgShot2Large.png – activout.se Dec 04 '08 at 19:36

score 2 · Answer 2 · answered Dec 04 '08 at 20:25

I would like to have following features/output information shown in a profiler. 1.) Should be able to show Total Clock cycles consumed and also for each function.

2.) If not one, should tell the total time consumed and time spent in each function.

3.) All it should be able to tell how many times a function is called.

4.) It would be nice to know memory reads, memory writes, cache misses, cache hits.

5.) Code memory for each function

6.) Data memory used: Global constants, Stack, Heap usage.

=AD

score 1 · Answer 3 · answered Dec 04 '08 at 19:30

1

The two classical answers (assuming you are in *nix world) are valgrind and gprof. You want something that will let you (at least) check how much time you are spending inside each procedure or function.

answered Dec 04 '08 at 19:30

bgoncalves

1,687
3
19
19

score 1 · Answer 4 · answered Dec 04 '08 at 19:35

1

Stability - be able to profile your process for long durations without crashing or running out of memory. its surprising how many commercial profilers fail that.

answered Dec 04 '08 at 19:35

shoosh

76,898
55
205
325

score 0 · Answer 5 · edited May 23 '17 at 10:26

All you need is a debugger or IDE that has a "pause" button. It is not only the simplest and cheapest tool, but in my experience, the best. This is a complete explanation why. Note the 2nd-to-last comment.

EDIT because I thought of a better answer:

As an aside, I studied A.I. in the 70s, and an idea very much in the air was automatic programming, and a number of people tried to accomplish it. (I took my crack at it.) The idea is to try to automate the process of having a knowledge structure of a domain, plus desired functional requirements, to generate (and debug) a program that would accomplish those requirements. It would be a tour-de-force in automated reasoning about the domain of programming. There were some tantalizing demonstrations, but in a practical sense the field didn't go very far. Nevertheless, it did contribute a lot of ideas to programming languages, like contracts and logical verification techniques.

To build an ideal profiler, for the purpose of optimizing programs, it would get a sample of the program's state every nanosecond. Either on-the-fly or later (ideal, remember?) it would carefully examine each sample, to see if, knowing the reasons for which the program is executing, that particular nanosecond of work was actually necessary or could be somehow eliminated.

That would be billions of samples and a lot of reasoning, but course there would be tremendous duplication, because any wastage costing, say, 10% of time, would be evident on 10% of samples. That wastage could be recognized on a lot fewer than a billion samples. If fact, 100 samples or even less could spot it, provided they were randomly chosen in time, or at least in the time interval the user cares about. This is assuming the purpose is to find the wastage so we can get rid of it, as opposed to measuring it with much precision.

Why would it be helpful to apply all that reasoning power to each sample? Well, if the programs were little, and it were only looking for things like O(n^2) code, it shouldn't be too hard. But suppose the state of the program consisted of a procedure stack 20-30 levels deep, possibly with some recursive function calls appearing more than once, possibly with some of the functions being calls to external processors to do IO, possibly with the program's action being driven by some data in a table. Then, to decide if the particular sample is wasteful requires potentially examining all or at least some of that state information, and using reasoning power to see if it is truly necessary in accomplishing the functional requirements.

What the profiler is looking for is nanoseconds being spent for dubious reasons. To see the reason it is being spent requires examining every function call site on the stack, and the code surrounding it, or at least some of those sites. The necessity of the nanosecond being spent requires the logical AND of the necessity of every statement being executed on the stack. It only takes one such function call site to have a dubious justification for the entire sample to have a dubious justification. So, if the entire purpose is to find nanoseconds being spent for dubious reasons, the more complicated the samples are, the better, and the more reasoning power brought to bear on each sample, the better. (That's why bigger programs have more room for speedup - they have deeper stacks, hence more calls, hence more likelihood of poorly justified calls.)

OK, that's in the future. However, since we don't need a huge number of samples (10 or 20 is very useful), and since we already have highly intelligent automatic programmers (powered by pizza and soda), we can do this now.

Compare that to the tools we call profilers today. The very best of them take stack samples, but what's their output? Measurements. "Hot paths". Rat's nest graphs. Eye-candy. From those, even an artificially intelligent programmer would easily miss large inefficiencies, except for the ones that are exposed by those outputs. After you fix the ones you do find, the ones you don't find are the ones that make all the difference.

One of the things one learns studying A.I. is, don't expect to be able to program a computer to do something if a human, in principle, can't also do it.

score 0 · Answer 6 · answered Feb 01 '10 at 17:29

My preference is for sampling profilers rather than instrumented profilers. The profiler should be able to map sample data back to the source code, ideally in a GUI. The two best examples of this that I am aware of are:

Mac OS X: Shark developer.apple.com
Linux: Zoom www.rotateright.com

score 0 · Answer 7 · answered Dec 04 '08 at 22:02

0

goldenmean has it right, I would add that line execution counts are sometimes handy as well.

answered Dec 04 '08 at 22:02

EvilTeach

28,120
21
85
141

What are your favorite features of a C/C++ performance profiler / analyzer?

7 Answers7