Linux time sample based profiler

Question

short version:

Is there a good time based sampling profiler for Linux?

long version:

I generally use OProfile to optimize my applications. I recently found a shortcoming that has me wondering.

The problem was a tight loop, spawning c++filt to demangle a c++ name. I only stumbled upon the code by accident while chasing down another bottleneck. The OProfile didn't show anything unusual about the code so I almost ignored it but my code sense told me to optimize the call and see what happened. I changed the popen of c++filt to abi::__cxa_demangle. The runtime went from more than a minute to a little over a second. About a x60 speed up.

Is there a way I could have configured OProfile to flag the popen call? As the profile data sits now OProfile thinks the bottle neck was the heap and std::string calls (which BTW once optimized dropped the runtime to less than a second, more than x2 speed up).

Here is my OProfile configuration:

$ sudo opcontrol --status
Daemon not running
Event 0: CPU_CLK_UNHALTED:90000:0:1:1
Separate options: library
vmlinux file: none
Image filter: /path/to/executable
Call-graph depth: 7
Buffer size: 65536

Is there another profiler for Linux that could have found the bottleneck?

I suspect the issue is that OProfile only logs its samples to the currently running process. I'd like it to always log its samples to the process I'm profiling. So if the process is currently switched out (blocking on IO or a popen call) OProfile would just place its sample at the blocked call.

If I can't fix this, OProfile will only be useful when the executable is pushing near 100% CPU. It can't help with executables that have inefficient blocking calls.

score 6 · Accepted Answer · edited May 23 '17 at 11:46

6

Glad you asked. I believe OProfile can be made to do what I consider the right thing, which is to take stack samples on wall-clock time when the program is being slow and, if it won't let you examine individual stack samples, at least summarize for each line of code that appears on samples, the percent of samples the line appears on. That is a direct measure of what would be saved if that line were not there. Here's one discussion. Here's another, and another. And, as Paul said, Zoom should do it.

If your time went from 60 sec to 1 sec, that implies every single stack sample would have had a 59/60 probability of showing you the problem.

edited May 23 '17 at 11:46

Community

1
1

answered Mar 15 '10 at 18:55

Mike Dunlavey

40,059
14
91
135

1

Mike, your point is very valid, I agree the technique with 100%. Any ideas on how to enable time based sampling via OProfile or in a more automated approach than just breaking in the debugger? – deft_code Mar 15 '10 at 20:03
@Caspin: I'm on Windows, and I'm not a user of OProfile, but this link (http://oprofile.sourceforge.net/doc/opreport.html) talks about its use and presentation of stack sample data. Also this link (http://oprofile.sourceforge.net/doc/detailed-parameters.html#timer) talks about timer interrupts. I can't tell if it will take samples during I/O or other blocking calls. – Mike Dunlavey Mar 15 '10 at 20:27
... note that the frequency of sampling does not need to be fast, but it does need to be able to sample during blocking calls unless you want to be blind to those. – Mike Dunlavey Mar 15 '10 at 20:29
2

... I forgot to mention, **lsstack** is an easy way to get stack samples manually, as is **pstack** (but without symbols). – Mike Dunlavey Mar 15 '10 at 20:43

score 3 · Answer 2 · answered Mar 15 '10 at 18:57

3

Try Zoom - I believe it will let you profile all processes - it would be interesting to know if it highlights your problem in this case.

answered Mar 15 '10 at 18:57

Paul R

208,748
37
389
560

Zoom version 1.6.6 does not find the problem either. The next version of Zoom will supposedly have a sampling mode (*"thread time profiling"*) that could find the problem. – deft_code Jun 16 '10 at 23:02

score 2 · Answer 3 · answered Apr 04 '14 at 22:29

2

I wrote this a long time ago, only because I couldn't find anything better: https://github.com/dicej/profile

I just found this, too, though I haven't tried it: https://github.com/oliver/ptrace-sampler

answered Apr 04 '14 at 22:29

Joel Dice

31
1

score 1 · Answer 4 · answered Oct 22 '10 at 02:24

1

Quickly hacked up trivial sampling profiler for linux: http://vi-server.org/vi/simple_sampling_profiler.html

It appends backtrace(3) to a file on SIGUSR1, and then converts it to annotated source.

answered Oct 22 '10 at 02:24

Vi.

37,014
18
93
148

Take a look at lsstack is implemented. It doesn't need special driver code of get the current backtrace. Also put your code out on bitbucket or google code. If you get a decently done project I'll contribute bug fixes as I use it. – deft_code Oct 22 '10 at 17:54
@caspin, OK, Now I'm looking how to make use of gdb (http://stackoverflow.com/questions/3999464/how-to-make-gdb-get-stacktrace-repeatably) to do the same thing. – Vi. Oct 22 '10 at 18:37
I tried your code, but it hangs, because you are using fopen from a signal handler, which is known to hang (you can only use very few functions from a signal handler). – Ondřej Čertík Apr 24 '14 at 18:52
Should it work reliably with low-level "open/write/close" instead of fopen? – Vi. Apr 25 '14 at 15:23

score 1 · Answer 5 · answered Jan 22 '19 at 06:17

After trying everything suggested here (except for the now-defunct Zoom, which is still available as huge file from dropbox), I found that NOTHING does what Mr. Dunlavey recommends. The "quick hacks" listed above in some of the answers wouldn't build for me, or didn't work for me either. Spent all day trying stuff... and nothing could find fseek as a hotspot in an otherwise simple test program that was I/O bound.

So I coded up yet another profiler, this time with no build dependencies, based on GDB, so it should "just work" for almost any debuggable code. A single CPP file.

https://github.com/jasonrohrer/wallClockProfiler

It automates the manual process suggested by Mr. Dunlavey, interrupting the target process with GDB periodically and harvesting a stack trace, and then printing a report at the end about which stack traces are the most common. Those are your true wall-clock hotspots. And it actually works.

Linux time sample based profiler

5 Answers5

Linked