aggregating of function calls from WPR/xperf profiling data, e.g. KCacheGrind?

Question

Is it possible to load WPR/xperf profiling data into KCacheGrind? Or is there a way to aggregate function calls in WPA directly? Or some other tool? Would the gprof2dot/graphviz route be the best option?

I find WPA useful, but grouping by stack in the "CPU Usage (Sampled)" table doesn't seem to allow sorting by aggregated function call count. For example, if function foo is called equally from 10 different places, it would be hard to identify foo as a potential bottleneck since each of the 10 code paths to foo will show as 10% or less of the weight. KCacheGrind solves this problem by allowing you to sort on cumulative time for each function.

How can I sort by cumulative time spent in each function with profiling on Windows, e.g. to identify low level shared functions like malloc as a bottlneck?

As a side note, how come I can't see *any* reference to malloc in my WPA data? When I do random pausing with Visual Studio I see plenty of call stacks ending in "msvcr100.dll!malloc()" -- is there a setting on WPR that I need to tick to get this level of granularity? — JDiMatteo, Oct 24 '14 at 16:06
Don't look for a particular function. Instead take stack samples and see what lines of code are on them. A profiler that does this is Zoom from rotateright.com, but I and plenty of people feel the manual method performs better. Please take a look at [*this*](http://stackoverflow.com/a/25870103/23771). — Mike Dunlavey, Oct 25 '14 at 01:32
@MikeDunlavey: thanks, I actually already had that question with your answer there favorited :) . Still it seems somewhat opinion based whether or not manual method vs profiling tools is the way to go, and I personally feel that a combination is best, so if anyone knows how to get this WPR tool to give useful aggregated information, I'd appreciate it. Zoom btw doesn't work on Windows. — JDiMatteo, Oct 27 '14 at 19:12
I suppose it seems like only an opinion - programmers aren't used to careful argument. That link shows speedups can be missed by profiler back-ends (*false negatives*). [*This link*](http://scicomp.stackexchange.com/a/2719/1262) shows why it costs you to ignore false negatives. [*This answer*](http://scicomp.stackexchange.com/a/1870/1262) gives a specific example and links to the code. In six stages, over 99.8% of time was trimmed, but any stage skipped costs dearly. That's not an isolated example. That example, and those arguments, are real - not opinion. — Mike Dunlavey, Oct 27 '14 at 19:31
@MikeDunlavey: thanks for that summary. You have convinced me now. I'll stop wasting time with profilers and just stick to debuggers, random pausing, and maybe a logging statement every once in a while. — JDiMatteo, Oct 27 '14 at 21:28
I don't want to preach. It's just math and experimentation. You can see for yourself. — Mike Dunlavey, Oct 27 '14 at 22:07

aggregating of function calls from WPR/xperf profiling data, e.g. KCacheGrind?

0 Answers0