3

I'm currently calling oprofile with these parameters:

operf --callgraph --vmlinux /usr/lib/debug/boot/vmlinux-$(uname -r) <BINARY>
opreport -a -l <BINARY>

As an example, the output is:

CPU: Core 2, speed 2e+06 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (Unhalted core cycles) count 90000
samples  cum. samples  %        cum. %     image name               symbol name
12635    12635         27.7674  27.7674    libc-2.15.so             __memset_sse2
9404     22039         20.6668  48.4342    vmlinux-3.5.0-21-generic get_page_from_freelist
4381     26420          9.6279  58.0621    vmlinux-3.5.0-21-generic native_flush_tlb_single
3684     30104          8.0962  66.1583    vmlinux-3.5.0-21-generic page_fault
701      30805          1.5406  67.6988    vmlinux-3.5.0-21-generic handle_pte_fault

You can see that most of the time is spent within __memset_sse2 but it is not obvious which of my own code should be optimized. At least not from the output above.

In my specific case, I was able to quickly locate the source of the problem by using some kind of poor man's profiler. I ran the program in a debugger, stopped it from time to time and looked at the call stacks of each thread.

Is it possible to get the same results directly from the output of oprofile? The strategy that I used will most likely fail if the performance bottleneck is not that obvious as it was in my example.

Is there an option to ignore all calls to external function (e.g., to the kernel or libc) and just accumulate the time to the caller? For example:

void foo() {
  // some expensive call to memset...
}

Here, it would be more insightful for me to see foo at the top of the profiling output, not memset.

(I tried opreport --exclude-dependent but found it not helpful as it seems only to skip the external functions in the output.)

Philipp Claßen
  • 41,306
  • 31
  • 146
  • 239
  • 1
    Does oprofile do stack samples, and report the percent of time a routine or line of code is on the stack? That's what you need, and that's what [*Zoom*](http://www.rotateright.com/) does. [*Here's*](http://stackoverflow.com/a/378024/23771) the way I do it. – Mike Dunlavey Dec 25 '12 at 14:39
  • Comes close. The technique that you describe certainly works. Closer to my question would be an automated tool that trims the stack by ignoring all unmodifiable functions (library code, kernel stuff) and then finally reports the percentage of the time where a function remains on top of the trimmed stack. I think that would be great to locate bottlenecks in your own code. (Don't know if Zoom can do that. I gave it a quick try but frankly I couldn't get it to work.) – Philipp Claßen Dec 26 '12 at 03:30

0 Answers0