simple steps on how I would profile a specific function
The method I use is random pausing, maybe better known as the poor man's profiler (which I liked until I saw how they aggregate).
Here's another link.
You don't have to tell it what specific function to look at.
It automatically finds what takes the most time, whether it's that function or another one.
The thing about it is, compared to "real" profilers, some might say it is crude, tedious, etc.
So why do it?
Because it's effective.
You nail your problem, down to specific instructions, and know what to fix, before you even begin to puzzle through the reams of stuff that comes out of most profilers - self time, call counts, call trees, call graphs, "hot paths", "cpu time", etc. etc..
The thing to ask of any profiler is not how "accurate" it is, but
- How much speedup is typically achieved by using it, in real (not toy) programs?
Isn't that what you care about?
Here's an example of a 43x speedup done with random-pausing.