How to profile a C++ my own shared library

Question

I've created a shared library and I need to profile it now. I have two projects now. One that generates a shared library and second that uses it for testing.

int main() {
    cout << "!!!Hello World!!!" << endl; // prints !!!Hello World!!!
    MatrixXd frames = creatMatriXdromVtdFile("/home/michael/Dropbox/Java_Workspace/test/frame.vtd");
//  MatrixXd frame = frames.row(0).array();
    auto start = std::chrono::system_clock::now();
    MatrixXd m = demodulateMatrix(frames.data(), frames.rows(), frames.cols());
    auto end = std::chrono::system_clock::now();
    auto elapsed =
        std::chrono::duration_cast<std::chrono::milliseconds>(end - start).count();

    cout << "Demodulated all data in ";
    cout << elapsed;
    cout << " us \n";
    cout << m.row(1);
    return 0;
}

This is basically the second project. After profiling with gprof I don't even see the demodulateMatrix function in output somehow.

enter image description here

How can I obtain information about what's happening inside of shared library (how memory is allocated, what functions take what time etc)?

Have you built the library with profiling enabled? With debugging information (though I'm not sure it's needed for profiling)? — Some programmer dude, May 21 '14 at 08:09
To me, it looks like this runs for far too short a period of time. You need to run something a bit "heavier" in your test-case. — Mats Petersson, May 21 '14 at 08:27
Well, demodulation time is about 500 milliseconds. It's more then 80 ms for creating the matrix for example. But I can see createMatrix in the output, but not the demodulation function... — user1685095, May 21 '14 at 08:29
That's `gprof`. It tells you self time, and that's useless. Please see [*this answer*](http://stackoverflow.com/a/23715893/23771). The general comments apply to your situation. (If the program runs too fast to sample, put a loop around it.) — Mike Dunlavey, May 21 '14 at 14:00

score 2 · Answer 1 · edited May 23 '17 at 11:48

gprof is a venerable and ground-breaking tool.

But you're finding out that it's very limited. It's based on sampling the program counter, plus counting calls between functions, all of which have a very tenuous connection to what costs time.

To find out what costs time in your program, it's actually quite simple. The method I and others use is this.

The point is, as the program runs, there is a call stack, consisting of the current program counter, plus a return address back to every function call instruction it is currently in the process of executing, on the thread.

If you can take an X-Ray snapshot of the call stack at a random point in time, and examine all its levels in the context of a debugger, you can tell exactly what it was trying to do, and why, at that point in time.

If it is spending 30% of its time doing something you never would have guessed, but that you don't really need, you will spot it on 3 out of 10 stack samples, more or less, and that's good enough to find it. You won't know precisely what the percent is, but you will know precisely what the problem is.

The time that any instruction costs (the time you'd save if you got rid of it) is just the percent of time it is on the stack, whether it is a non-call instruction, or a call instruction. If it costs enough time to be worth fixing, it will show up in a moderate number of samples.

ADDED: Not to belabor the point, but there's always somebody who says "That's too few samples, it will find the wrong stuff!" Well, OK. Suppose you take ten random-time samples of the stack, and you see something you could get rid of on three of them. How much does it cost? Well you don't know for sure. It has a probability distribution that looks exactly like this:

enter image description here

You can see that the most likely cost is 30% (no surprise), for a speedup of 10/7 = 1.4x, but it will be more or less than that. How much more or less? Well, the clear space between the two shaded regions holds 95% of the probability. In other words, yes there is a chance that the cost is less than 10%, namely about 2.5%. If the cost is 10%, the speedup is 10/9 = 1.1x. On the other hand, there is an equal probability that the cost is higher than 60%, for a speedup of 10/4 = 2.5x.

So the estimated speedup is 1.4, but though it could be as low as 1.1, don't throw away the equal chance that it could be as high as 2.5. Of course, if you take 20 samples instead of 10, the curve will be narrower.

How to profile a C++ my own shared library

1 Answers1