22

I would like to know how to profile a pthread mutex to see if there are any locking contention points in my code. (who likes contentious code, right? :) I know how to do a more general profiling of the code, as I mention here. But I would like to know if there are any tools or options available to be able to profile mutex locking that would provide metrics/stats about mutex locking contentions to see if I have any problem areas.

Here's some background and context:

Recently I worked on an embedded C++ project using a Cavium Octeon CPU. The Octeon SDK implements mutex style synchronization using spinlocks. Looking through the Octeon documentation, I came across a way to profile the spinlocks to be able to see how many times each spinlock had to spin while waiting for the lock to become available. To use this I had to do a conditional compile and then it would increment a counter each time the spinlock spun, then I could query the spinner wait value. So, what I did was to encapsulate the spinlock and added the ability to dump the spinlock spinner wait value for all of the spinlocks used in the system. The actual value didnt mean much, but there were a few that had really high values compared to the rest, and I focused on reducing the contention for those.

I know this is probably quite easy for spinlocks, since its just a counter per spin, but reading through the related pthread man pages and header files I havent found anything similar, is there something available for pthread mutex?

I would really like to avoid having to do something hacky like taking the time before and after each lock.

PS: What's the plural of mutex? mutexes, muteces, mutexi, muti??? Mutexes never sounded right to me.

Community
  • 1
  • 1
Brady
  • 10,207
  • 2
  • 20
  • 59

5 Answers5

17

The valgrind tool drd lets you specify a limit on how long a lock should be waited on before reporting an error.

This site mentions drd and also mentions their own tool called mutrace which looks like the kind of tool you're after. It tells you:

  • how many times a mutex was locked
  • how many times the mutex owning thread changed
  • how many times a mutex was contended (already locked when a lock request was made)
  • various stats on the duration a mutex was locked for

e.g.

mutrace: 10 most contended mutexes:

 Mutex #   Locked  Changed    Cont. tot.Time[ms] avg.Time[ms] max.Time[ms]       Type
      35   368268      407      275      120,822        0,000        0,894     normal
       5   234645      100       21       86,855        0,000        0,494     normal
      26   177324       47        4       98,610        0,001        0,150     normal
      19    55758       53        2       23,931        0,000        0,092     normal
      53      106       73        1        0,769        0,007        0,160     normal
      25    15156       70        1        6,633        0,000        0,019     normal
       4      973       10        1        4,376        0,004        0,174     normal
      75       68       62        0        0,038        0,001        0,004     normal
       9     1663       52        0        1,068        0,001        0,412     normal
       3   136553       41        0       61,408        0,000        0,281     normal
     ...      ...      ...      ...          ...          ...          ...        ...

mutrace: Total runtime 9678,142 ms.
TheJuice
  • 4,434
  • 2
  • 26
  • 36
3

You didn't mention your OS. If it's linux, I've used LTT quite a bit for this sort of profiling. It can be useful in two modes:

  1. performance profiling -- e.g. profile your system for N seconds of standard use and then analyze the data

  2. exception profiling -- e.g. run it around the clock in "flight recorder" mode (record the last N seconds of activity) and on some exceptional condition trigger a stop to the flight recorder. Come in the next morning and analyze the data.

I've been using and seeing 'mutexes' as the plural of mutex for years now without complaint. ;>

Matthieu M.
  • 287,565
  • 48
  • 449
  • 722
Kylo
  • 322
  • 1
  • 7
  • Good point, I updated the question title to mention linux, thanks. – Brady Jun 01 '12 at 15:22
  • I like this tool and am evaluating it and the valgrind drd tool today, and hope to have results soon. Even if I dont use LTT for the mutex profiling, it looks like a very useful tool that I will use in the future, thanks! – Brady Jun 04 '12 at 08:37
3

I found 'mutrace' while exploring for the same use case. I did not use it yet though. It sounded good as it would not affect the runtime much as in valgrind.

Dženan
  • 3,329
  • 3
  • 31
  • 44
sunil
  • 3,507
  • 18
  • 25
2

You might want to also give Intel VTune a try. It would report Wait Time and Wait Count for each sync primitive with call stacks too. Look through this document to see if this is what you're looking for http://software.intel.com/sites/products/documentation/hpc/amplifierxe/en-us/2011Update/lin/start/analyzing_locks_linux.pdf

Anton Pegushin
  • 450
  • 3
  • 11
  • Anton, looks very interesting, thanks. I imagine this is only available with the intel compiler, right? The execution time display is really cool! – Brady Jun 05 '12 at 13:20
  • Nope, it's a separate product, you would not need intel compiler. The binary can be compiled with a compiler of your choice, the analysis engine instruments the resulting binary to locate threading and synchronization APIs and then runs the binary to trace these API calls. Intel has a pretty generous evaluation license for all these products - full functionality is available for 30 days at least (could be 60, not sure), you'll definitely have enough time to profile your application. – Anton Pegushin Jun 06 '12 at 09:32
0

Its been 6+ years, but for the reference I would like to add another link for off-cpu time tracing which supposed to show your mutex contention. http://www.brendangregg.com/offcpuanalysis.html

kreuzerkrieg
  • 3,009
  • 3
  • 28
  • 59