I want to improve the performance of a specific method inside a larger application.
The goal is improving latency (wall-clock time spent in a specific function), not (neccessarily) system load.
Requirements:
- As I expect a lot of the latency to be due to I/O, take into account time spent waiting/blocked (in other words: look at wall clock time instead of CPU time)
- As the program does much more than the fragment i'm trying to optimize. There needs to be a way to either start/stop profiling programmatically, or to filter the output to only show the time between entering and exiting the function i'm optimizing.
- Profiling on the method level is acceptable (if it can be done on the instruction level, even better. if it only profiles system calls, that's probably not enough)
- This is for a hobby project, so expensive tools are not really an option
- Instrumentation (-finstrument-functions) is acceptable
- The critical piece of code I'm interested in is hard to interrupt manually (because it is already relatively fast and hard to realistically invoke in a loop), so some kind of automation is necessary.
Tools discarded so far:
- gprof, oprofile, callgrind (requirement 1)
- buiding something custom using getrusage (requirement 1)
- poormansprofiler.org (requirement 2)
- strace -T,dtrace,http://perf.wiki.kernel.org (requirements 2 and 3)
- VTune,Zoom (requirement 4)
- manual call-stack sampling (requirement 6)
- google-perftools (should be able to measure wall time, but this does not appear to work in my case, presumably because SIGALRM interference.
- systemtap (my kernel isn't patched to include utrace)
Other options which I haven't further evaluated yet:
- cprof (doesn't build here out-of-the-box, seems i386-only)
- manually inserting trace points (e.g. with lttng)
I'd love to hear about:
- other options
- perhaps I discarded some tool too soon?
- whether or not the options I haven't evaluated yet have a chance of working, and if so, how to best do it.
I finally settled for:
- building something custom using -finstrument-functions myself, based on http://balau82.wordpress.com/2010/10/06/trace-and-profile-function-calls-with-gcc/
The trace produced by this crude tool is hard to interpret, and I can easily imagine some tools for further processing its output making it infinitely more useful. However, this did the job for me for now, so I'm putting that project off until later ;).