First of all, Don't Look At Self-Time.
The only reason it's in there is because somebody might write code that spends too much time in tight CPU loops in their own code.
Does your program spend all it's time inverting matrices or doing FFT?
The whole reason gprof was invented, and the whole avalanche of profilers that followed, is that real software contains lots of subroutines that call each other in a big rat's nest, and by far the easiest way to take too much time is to make subroutine calls that could be avoided.
Self time does not expose those.
Next, whether you are looking at Wall time or CPU time, if WaitForSingleObject() is active a large fraction of time, that means your thread is mostly waiting for something.
You need to find out what.
There is a very simple way to find out. Just hit Pause, Ctrl-C, Esc, or whatever key makes it halt in its tracks, and then look at the stack.
Every line of code on the call stack represents an unfulfilled request which, if that request had not been made, the program would not be waiting.
So that's how you know what it is waiting for.
If you happen to catch it when it's not in WaitForSingleObject(), do it again, maybe a few times.
This may seem to be more effort than profiling, but you can choose between finding the problem or doing something easy.