gprof does not take stack traces, and it works on CPU-time, not wall-time.
It just samples the program counter, on CPU-time, and attributes it to functions it knows about.
Its main claim to fame, compared to previous profilers, is that since PC-only ("self time") sampling is pretty useless in decent-sized apps where the call-stack is many layers deep,
it also counts how many times any function A calls any function B.
Then it tries to guess (by some pretty shaky math) how much CPU time can be charged back to the higher-level routines that are invoking the lower-level routines.
There are profilers that take stack traces on wall-time.
(CPU-time means if your app is somehow blowing time at a very low level by sleeping, I/O-ing, hanging on a semaphore, or some other blocking, you will never see it.)
I know of one that stack-samples on wall-time, namely Zoom.
I'm told OProfile can do it, but I can't verify it.
Same for DTrace.
But that's just talking about the front end, the taking of samples.
Just as important is the back end, the part that presents stuff to you.
Typically you get "hot paths", "call graphs", "flame graphs", etc. etc.
Personally, I take a jaundiced view of all these spiffy toys.
What they do, they do well, no question.
But if speedup results are what is needed,
then the best information comes from a small number of stack samples,
taken at the time you care about, that are actually looked at and understood, not just summarized.
There is no summarizer that recognizes patterns better than the head of a programmer,
and any problem big enough to be worth fixing will be evident in a small number of samples.
Here's an example,
and here's another,
and if you want to see some real math behind it, look here.