49

I used to do all my Linux profiling with gprof.

However, with my multi-threaded application, it's output appears to be inconsistent.

Now, I dug this up:

http://sam.zoy.org/writings/programming/gprof.html

However, it's from a long time ago and in my gprof output, it appears my gprof is listing functions used by non-main threads.

So, my questions are:

  1. In 2010, can I easily use gprof to profile multi-threaded Linux C++ applications? (Ubuntu 9.10)
  2. What other tools should I look into for profiling?
Guy Avraham
  • 3,482
  • 3
  • 38
  • 50
anon
  • 41,035
  • 53
  • 197
  • 293
  • 5
    Preferably something that doesn't slow down as much as valgrind does. – anon Mar 23 '10 at 02:36
  • 2
    Please tell me the alternatives, but not the alternatives. Check. – dmckee --- ex-moderator kitten Mar 23 '10 at 02:37
  • well you wanted profiler, profiler loads app your binary and adds extra hooks to keep track of execution so it will be always slower that if you run it without valgrind. Any reason you have a problem with slowing down? I think valgrind was used to profile some big applications and it did not have a problem - unless you have a reason why it can't ran slower during profiling. – stefanB Mar 23 '10 at 02:40
  • 4
    @stefanB : the slowdown from gprof with g++'s -pg is fine; valgrind's slow down is like 10x atleast – anon Mar 23 '10 at 02:44
  • So when you run gprof vs valgrind do you get any huge differences in results? Or you just don't like the fact that it runs slower? I mean there might be valid reasons why you can't use slower profiler but I assume you want to see where your application is spending time. – stefanB Mar 23 '10 at 03:21
  • 4
    @dmckee @stefanB why are you so harsh? It's a perfectly valid observation that valgrind slows down program very much. I for one go with the fastest profiler too, given the choice. – Laurynas Biveinis Mar 23 '10 at 06:37
  • 1
    Slow down is especially annoying when you have programs with timers... we need to magnify all our time out values by 10 when we run valgrind to trace memory leaks, that's annoying :/ – Matthieu M. Mar 23 '10 at 08:29
  • 1
    @Laurynas: Because asking a questions then changing the conditions *after* the answers start coming in is par for the course with this OP. Because the question does not actually suck, but the poster's approach is unhelpful. Because if he already knew that valgrind was not a good choice for this use and didn't mention it he's doing it wrong. – dmckee --- ex-moderator kitten Mar 23 '10 at 13:13
  • 5
    Valgrind has a lot of brand loyalty, in spite of being 10x slow. The problem is that it doesn't slow down I/O by a proportionate factor, so if normally there is 30% time spent in unnecessary I/O, under Valgrind it will only look like 3%. – Mike Dunlavey Mar 23 '10 at 16:43
  • 1
    @dmckee I see. I assumed OP just forgot to tell all the constraints, something which happens to me all the time, but I guess I am too generous with the benefit of doubt. – Laurynas Biveinis Mar 24 '10 at 07:36
  • 1
    @everyone: I'm trying to profile an interactive OpenGL application running at 30fps. The 10x slowdown is unacceptable. dmckee: you are right in that this requirement was never stated in the original problem. – anon Mar 24 '10 at 08:09

10 Answers10

15

Edit: added another answer on poor man's profiler, which IMHO is better for multithreaded apps.

Have a look at oprofile. The profiling overhead of this tool is negligible and it supports multithreaded applications---as long as you don't want to profile mutex contention (which is a very important part of profiling multithreaded applications)

Community
  • 1
  • 1
Laurynas Biveinis
  • 10,547
  • 4
  • 53
  • 66
7

Try modern linux profiling tool, the perf (perf_events): https://perf.wiki.kernel.org/index.php/Tutorial and http://www.brendangregg.com/perf.html:

perf record ./application
# generates profile file perf.data
perf report
osgx
  • 90,338
  • 53
  • 357
  • 513
7

Have a look at poor man's profiler. Surprisingly there are few other tools that for multithreaded applications do both CPU profiling and mutex contention profiling, and PMP does both, while not even requiring to install anything (as long as you have gdb).

Laurynas Biveinis
  • 10,547
  • 4
  • 53
  • 66
6

Have a look at Valgrind.

stefanB
  • 77,323
  • 27
  • 116
  • 141
  • 3
    The problem that led me to this thread is Callgrind's weird scheduling differences and the fact that it runs everything in a single thread. I am trying to find bottlenecks from my atomic operations and spinlocks, Single threading everything kills contention and performance problems that may cause. So Valgrind, despite my wishes to the contrary, is not always the profiler of choice. – James Matta Dec 06 '17 at 20:51
6

A Paul R said, have a look at Zoom. You can also use lsstack, which is a low-tech approach but surprisingly effective, compared to gprof.

Added: Since you clarified that you are running OpenGL at 33ms, my prior recommendation stands. In addition, what I personally have done in situations like that is both effective and non-intuitive. Just get it running with a typical or problematic workload, and just stop it, manually, in its tracks, and see what it's doing and why. Do this several times. Now, if it only occasionally misbehaves, you would like to stop it only while it's misbehaving. That's not easy, but I've used an alarm-clock interrupt set for just the right delay. For example, if one frame out of 100 takes more than 33ms, at the start of a frame, set the timer for 35ms, and at the end of a frame, turn it off. That way, it will interrupt only when the code is taking too long, and it will show you why. Of course, one sample might miss the guilty code, but 20 samples won't miss it.

Community
  • 1
  • 1
Mike Dunlavey
  • 40,059
  • 14
  • 91
  • 135
1

I tried valgrind and gprof. It is a crying shame that none of them work well with multi-threaded applications. Later, I found Intel VTune Amplifier. The good thing is, it handles multi-threading well, works with most of the major languages, works on Windows and Linux, and has many great profiling features. Moreover, the application itself is free. However, it only works with Intel processors.

0

You can randomly run pstack to find out the stack at a given point. E.g. 10 or 20 times. The most typical stack is where the application spends most of the time (according to experience, we can assume a Pareto distribution).

You can combine that knowledge with strace or truss (Solaris) to trace system calls, and pmap for the memory print.

If the application runs on a dedicated system, you have also sar to measure cpu, memory, i/o, etc. to profile the overall system.

  • Check most voted posts by Mike Dunlavey https://stackoverflow.com/users/23771/mike-dunlavey - where he proved that 5 call stack samples (gdb or pstack or ...) are statistically enough. Also, 400 posts in profiling tag: https://stackoverflow.com/search?q=user:23771%20[profiling]%20is:answer – osgx May 30 '17 at 20:31
0

Since you didn't mention non-commercial, may I suggest Intel's VTune. It's not free but the level of detail is very impressive (and the overhead is negligible).

rustyx
  • 80,671
  • 25
  • 200
  • 267
0

Microprofile is another possible answer to this. It requires hand-instrumentation of the code, but it seems like it handles multi-threaded code pretty well. And it also has special hooks for profiling graphics pipelines, including what's going on inside the card itself.

Omnifarious
  • 54,333
  • 19
  • 131
  • 194
0

Putting a slightly different twist on matters, you can actually get a pretty good idea as to what's going on in a multithreaded application using ftrace and kernelshark. Collecting the right trace and pressing the right buttons and you can see the scheduling of individual threads.

Depending on your distro's kernel you may have to build a kernel with the right configuration (but I think that a lot of them have it built in these days).

bazza
  • 7,580
  • 15
  • 22