You want to try and find out where the biggest delays are occurring and are not sure how to go about it.
Everyone's first instinct is to start measuring times used by methods, and counting how many times they are called, and progressively narrowing down, and using intuition to make a smart guess as to what could be fixed. However, there's another way to think about it, not in terms of time, but in terms of necessity.
The program walks a call tree. The main routine calls A, then it calls B, and so on. A calls C, then D, etc. That is walking the call tree, all the way down to the leaves, which are simple statements, system calls, and I/O. If all of those activities simply must be done, then the program is as fast as possible.
Since there are delays, some of those activities are not strictly necessary, and the way for them to be not necessary is for one of the branches leading to them (the call points) to be not strictly necessary.
Suppose your delay costs 50% of the time. Then those call points are active and on the call stack for 50% of the time. So if you pause the program at random and display the stack, the chance you will see it is 50%. If you do this 10 times, it will appear on 5 of them, more or less.
In fact, if you do such random pausing and examining the stack, if a call point appears on only one sample, it tells you nothing. But if it appears on more than one sample, and if it is something you could avoid doing, you've found a delay, and fixing it will make the program faster. What's more, there is almost no delay you can fix that can escape this kind of scrutiny.
As a program becomes larger and more complex, such non-strictly-necessary call points tend to creep in, just like bugs. Unlike bugs, they don't demand that you fix them; they only slow the program down, so it's good to do this once in a while.