3

Let's say I have a contrived program:

#include <Windows.h>

void useless_function()
{
    Sleep(5000);
}

void useful_function()
{
    // ... do some work
    useless_function();
    // ... do some more work
}

int main()
{
    useful_function();
    return 0;
}

Objective: I want the profiler to tell me useful_function() is needlessly calling useless_function() which waits for no obvious reasons. Under XPerf, this doesn't show up in any of the graphs I have because the call to WaitForMultipleObjects() seem to be accounted to Idle.exe instead of my own program.

And here's the xperf command line that I currently run:

xperf -on Latency -stackwalk Profile

Any ideas?

(This is not restricted to wait functions. The above might have been solved by placing breakpoints at NtWaitForMultipleObjects. Ideally there could be a way to see the stack sample that's taking up a lot of wall-clock time as opposed to only CPU time)

kizzx2
  • 18,775
  • 14
  • 76
  • 83

3 Answers3

2

I think what you are looking for is the Wait analysis with Ready Thread functionality in Xperf. It captures every context switch and gives you the call stack of the thread once it wakes up from sleep (or an otherwise blocked operation). In your case, you would see the stack just after the call sleep(5000) as well as the time spend sleeping.

The functionality is a bit obscure to use. But it is fortunately well described here:

Use Xperf's Wait Analysis for Application-Performance Troubleshooting

Thomas Kejser
  • 1,264
  • 1
  • 10
  • 30
1

Wait Analysis is the way to do this. You should:

  • Record the CSWITCH provider, in order to get all context switches
  • Record call stacks on context switches by adding +CSWITCH to your -stackwalk argument
  • Probably record call stacks on the ready thread to get more information on who readied you (i.e.; who released the Mutex or CS or semaphore and where) by adding +READYTHREAD to your -stackwalk

Then you use CPU Usage (Precise) in WPA (or xperfview, but that's ancient) to look at the context switches and find where your TimeSinceLast is high on a thread that shouldn't be going idle. You'll typically want the columns in CPU Usage (Precise) in this sort of order:

  • NewProcess (your process being switched in)
  • NewThreadId
  • NewThreadStack
  • ReadyingProcess (who made your thread ready to run)
  • ReadyingThreadId (optional)
  • ReadyThreadStack (optional, requires +ReadyThread on -stackwalk)
  • Orange bar
  • Count
  • TimeSinceLast (us) - sort by this column, usually
  • Whatever other columns you want

For details see these particular articles from my blog: - https://randomascii.wordpress.com/2014/08/19/etw-training-videos-available-now/ - https://randomascii.wordpress.com/2012/06/19/wpaxperf-trace-analysis-reimagined/

Bruce Dawson
  • 3,284
  • 29
  • 38
0

This "profiler" will tell you - just randomly pause it a few times and look at the stack. If do some work takes 5 seconds, and do some more work takes 5 seconds, then 33% of the time the stack will look like this

main: calling useful_function
useful_function: calling useless_function
useless_function: calling Sleep

So roughly 33% of your stack samples will show exactly that. Any line of code that's costing some fraction of wall-clock time will appear on roughly that fraction of samples.

On the rest of the samples you will see it doing the other things.

There are automated profilers that do the same thing in a more pretty way, such as Zoom and LTProf, although they don't actually show you the samples.

I looked at the xperf doc, trying to figure out if you could get stack samples on wall-clock time and get percents at line-level resolution. It seems you gotta be on Windows 7 or Vista. They only bother with functions, not lines, which if you have realistically big functions, is important. I couldn't figure out how to get access to the individual samples, which I think is important for seeing why the program is spending its time.

Mike Dunlavey
  • 40,059
  • 14
  • 91
  • 135
  • Thanks Mike. I've read a lot of your posts about the "manual sampling" technique. It's genuinely useful. On the other hand, Windows XPerf is otherwise an excellent tool for profiling programs -- it has stack trace for disk IO, context switches, and **heap usage**, etc. IMO wall clock time is the only piece that's missing from the otherwise comprehensive free tool. BTW the official site for LTProf seems to be down at the moment :P – kizzx2 Nov 14 '10 at 06:16
  • @kizzx2: Yeah, for heap usage I have no coherent opinion. If I come across as negative on xperf (& other tools) I truly don't wish to, but I am trying to shine light on what needs to be fixed and why. For LTProf, I bought a copy, because it looked like all the right stuff underneath. Then I found 1) you can't throttle it manually in a UI application, and 2) for presentation, they give percent-by-function, and the line percent is relative to the function. So if you want to know the real percent, you have to multiply those two. Then there's still the issue of making samples visible. – Mike Dunlavey Nov 14 '10 at 16:36
  • @kizzx2: ... The latter point is probably something fundamental, and I feel stronly about it. It's the difference between just sort of measuring things and actually understanding *why* the program is spending certain fractions of its time. I see stacks 30 layers deep and they seem totally confusing, but if I take the time to read it and look at the code at various levels, and data, then I can really understand if what it's doing *needs* to be done, and that's the key. – Mike Dunlavey Nov 14 '10 at 16:49