profiler for c++ code, very sleepy

Question

I'm a newbie with profiling. I'd like to optimize my code to satisfy timing constraints. I use Visual C++ 08 Express and thus had to download a profiler, for me it's Very Sleepy. I did some search but found no decent tutorial on Sleepy, and here my question: How to use it properly? I grasped the general idea of profiling, so I sorted according to %exclusive to find my bottlenecks. Firstly, on the top of this list I have ZwWaitForSingleObject, RtlEnterCriticalSection, operator new, RtlLeaveCriticalSection, printf, some iterators ... and after they take some like 60% there comes my first function, first position with Child Calls. Can someone explain me why above mentioned come out, what do they mean and how can I optimize my code if I have no access to this critical 60%? (for "source file": unknown...). Also, for my function I'd think I get time for each line, but it's not the case, e.g. arithmetics or some functions have no timing (not nested in unused "if" clauses). AND last thing: how to find out that some line can execute superfast, but is called thousands times, being the actual bottleneck?

Finally, is Sleepy good? Or some free alternative for my platform?

Help very appreciated! cheers!

- - - UPDATE - - - - -

I have found another version of profiler, called plain Sleepy. It shows how many times some snippet was called plus the number of line (I guess it points to the critical one). So in my case.. KiFastSystemCallRet takes 50%! It means that it waits for some data right? How to improve that matter, is there maybe a decent approach to trace what causes these multiple calls and eventually remove/change it?

"I use Visual C++ 08 Express and thus had to download a profiler, for me it's Very Sleepy." My recommendation is coffee... Usually two cups does it for me. Your mileage may vary. — ScoPi, Nov 30 '12 at 14:53
if you want an alternative to Very Sleepy, have a look at AMD's CodeAnalyst, it comes with a few tutorials and works on Intel processors as well (you just don't get all the advanced AMD options). — Necrolis, Nov 30 '12 at 14:53
"decent approach to trace what causes these multiple calls" That's exactly what pausing tells you. You'll see KiFastSystemCallRet on the stack, and you'll see where it's called from, and where *that's* called from, etc. — Mike Dunlavey, Dec 03 '12 at 01:53
I have a Very Sleepy fork [here](https://github.com/CyberShadow/verysleepy) with a few improvements. — Vladimir Panteleev, Jan 15 '14 at 12:51

score 6 · Accepted Answer · edited May 23 '17 at 12:32

I'd like to optimize my code to satisfy timing constraints

You're running smack into a persistent issue in this business. You want to find ways to make your code take less time, and you (and many people) assume (and have been taught) the only way to do that is by taking various sorts of measurements.

There's a minority view, and the only thing it has to recommend it is actual significant results (plus an ironclad theory behind it).

If you've got a "bottleneck" (and you do, probably several), it's taking some fraction of time, like 30%.
Just treat it as a bug to be found.

Randomly halt the program with the pause button, and look carefully to see what the program is doing and why it's doing it. Ask if it's something that could be gotten rid of. Do this 10 times. On average you will see the problem on 3 of the pauses. Any activity you see more than once, if it's not truly necessary, is a speed bug. This does not tell you precisely how much the problem costs, but it does tell you precisely what the problem is, and that it's worth fixing. You'll see things this way that no profiler can find, because profilers are only programs, and cannot be broad-minded about what constitutes an opportunity.

Some folks are risk-averse, thinking it might not give enough speedup to be worth it. Granted, there is a small chance of a low payoff, but it's like investing. The theory says on average it will be worthwhile, and there's also a small chance of a high payoff. In any case, if you're worried about the risks, a few more samples will settle your fears.

After you fix the problem, the remaining bottlenecks each take a larger percent, because they didn't get smaller but the overall program did. So they will be easier to find when you repeat the whole process.

There's lots of literature about profiling, but very little that actually says how much speedup it achieves in practice. Here's a concrete example with almost 3 orders of magnitude speedup.

John Deters · Answer 2 · 2012-11-30T16:26:35.227

I've used GlowCode (commercial product, similar to Sleepy) for profiling native C++ code. You run the instrumenting process, then execute your program, then look at the data produced by the tool. The instrumenting step injects a little trace function at every methods' entrypoints and exitpoints, and simply measures how much time it takes for each function to run through to completion.

Using the call graph profiling tool, I listed the methods sorted from "most time used" to "least time used", and the tool also displays a call count. Simply drilling into the highest percentage routine showed me which methods were using the most time. I could see that some methods were very slow, but drilling into them I discovered they were waiting for user input, or for a service to respond. And some took a long time because they were calling some internal routines thousands of times each invocation. We found someone made a coding error and was walking a large linked list repeatedly for each item in the list, when they really only needed to walk it once.

If you sort by "most frequently called" to "least called", you can see some of the tiny functions that get called from everywhere (iterator methods like next(), etc.) Something to check for is to make sure the functions that are called the most often are really clean. Saving a millisecond in a routine called 500 times to paint a screen will speed that screen up by half a second. This helps you decide which are the most important places to spend your efforts.

I've seen two common approaches to using profiling. One is to do some "general" profiling, running through a suite of "normal" operations, and discovering which methods are slowing the app down the most. The other is to do specific profiling, focusing on specific user complaints about performance, and running through those functions to reveal their issues.

One thing I would caution you about is to limit your changes to those that will measurably impact the users' experience or system throughput. Shaving one millisecond off a mouse click won't make a difference to the average user, because human reaction time simply isn't that fast. Race car drivers have reaction times in the 8 millisecond range, some elite twitch gamers are even faster, but normal users like bank tellers will have reaction times in the 20-30 millisecond range. The benefits would be negligible.

Making twenty 1-millisecond improvements or one 20-millisecond change will make the system a lot more responsive. It's cheaper and better if you can do the single big improvement over the many small improvements.

Similarly, shaving one millisecond off a service that handles 100 users per second will make a 10% improvement, meaning you could improve the service to handle 110 users per second.

The reason for concern is that coding changes strictly to improve performance often negatively impact your code's structure by adding complexity. Let's say you decided to improve a call to a database by caching results. How do you know when the cache goes invalid? Do you add a cache cleaning mechanism? Consider a financial transaction where looping through all the line items to produce a running total is slow, so you decide to keep a runningTotal accumulator to answer faster. You now have to modify the runningTotal for all kinds of situations like line voids, reversals, deletions, modifications, quantity changes, etc. It makes the code more complex and more error-prone.

profiler for c++ code, very sleepy

2 Answers2