As Schumi said, you can use something like pstack to get stack samples.
However, what you really need to know is why the program is spending the instant of time when the sample was taken.
Maybe you can figure that out from only a stack of function names.
It's better if you can also see the lines of code where the calls occurred.
It's better still if you can see the argument values and data context.
The reason is, contrary to popular conceptions that you are looking for "hot spots", "slow methods", "bottlenecks" - i.e. a measurement-based perspective, the most valuable thing to look for is things being done that could be eliminated.
In other words, when you halt the program in the debugger, consider whatever it is doing as if it were a bug.
Try to find a way not to do that thing.
However, resist doing this until you take another sample and see it doing the same thing - however you describe that thing.
Now you know it's taking significant time.
How much time? It doesn't matter - you'll find out after you fix it.
You do know that it's a lot. The fewer samples you had to take before seeing it twice, the bigger it is.
Then there's a "magnification effect". After you fix that "speed bug" the program will take a lot less time - but - that wasn't the only one.
There are others, and now they take a larger fraction of the time.
So do it all again.
By the time you finish this, if the program is any bigger than a toy, you could be amazed at how much faster it is.
Here's a 43x speedup.
Here's a 730x speedup.
Here's the dreary math behind it.
You see, the problem with tools is you're paying a price for that ease of sampling.
Since you're thinking of it as measurement, you're not concentrating on the reasons why the code is doing what it's doing - dubious reasons.
That causes you to miss opportunities to make the code faster,
causing you to miss the magnification effect,
causing you to stop far short of your ultimate possible speedup.
EDIT: Apologies for the flame. Now to answer your question - I do not turn on compiler optimization until the very end, because it can mask bigger problems.
Then I try to do a build that has optimization turned on, but still has symbolic information so the debugger can get a reasonable stack trace and examine variables.
When I've hit diminishing speedup returns, I can see how much difference the optimizer made just by measuring overall time - don't need a profiler for that.