Right now it tells me a method takes a lot of time by itself, which is
false because it's the external system DLL call which takes all the
time, but it doesn't display this info.
That's OK.
If you knew that some system routine owned the program counter a lot, how would that help you?
You still need to figure out what in your code authorized it.
Example: Memory allocation is a system function that often takes a large fraction of time.
Does that mean you need a faster memory allocator?
No, it means you need to do fewer new
s.
You should look for routines (or even better - lines) in your code whose inclusive wall-clock time (self plus callees) is a large percent of overall time.
(Don't look for high call counts or high milliseconds. Look for a high percent.)
Why? Because that's the fraction of overall time it is responsible for.
If you could somehow make the routine or line take no time, the overall time would decrease by that percent.
Usually the way you do this is by having it make fewer subordinate calls, or maybe none at all.
For example, if your program takes 10 seconds, and if there is a line of code that does new
and its inclusive percent is 20% (i.e. that line of code and its enclosing routine is on the stack 20% of the time), then if you could execute that line a lot less or not at all, you would save 2 seconds.