1

I recently found myself trying to diagnose why a particular Ruby program was running slow. In the end it turned out be caused by a scaling issue causing a lot of contention on a particular mutex.

I was wondering if there are any tools that I could have used to make this issue easier to diagnose? I know I could have used ruby-prof to get detailed output of what all 100+ threads of this program were spending their time on, but I'm curious whether there is any tool that is specifically focused on just measuring mutex contention in Ruby?

Ilya Vassilevsky
  • 981
  • 6
  • 14
Reck
  • 7,966
  • 2
  • 20
  • 24
  • I don`t have any experience with Ruby , however if you are working on a Linux that you can install Systemtap , you can use a Systemtap script which will show you contention per process/thread as I wrote its details here : http://stackoverflow.com/questions/38623976/how-to-debug-a-futex-contention-shown-in-strace/38648135#38648135 – Akin Ocal Aug 06 '16 at 15:54

1 Answers1

1

So If figured out how to do this with DTrace.

Given a Ruby program like this:

# mutex.rb
mutex = Mutex.new
threads = []

threads << Thread.new do
  loop do
    mutex.synchronize do
      sleep 2
    end
  end
end

threads << Thread.new do
  loop do
    mutex.synchronize do
      sleep 4
    end
  end
end

threads.each(&:join)

We can use a DTrace script like this:

/* mutex.d */
ruby$target:::cmethod-entry
/copyinstr(arg0) == "Mutex" && copyinstr(arg1) == "synchronize"/
{
  self->file = copyinstr(arg2);
  self->line = arg3;
}

pid$target:ruby:rb_mutex_lock:entry
/self->file != NULL && self->line != NULL/
{
  self->mutex_wait_start = timestamp;
}

pid$target:ruby:rb_mutex_lock:return
/self->file != NULL && self->line != NULL/
{
  mutex_wait_ms = (timestamp - self->mutex_wait_start) / 1000;
  printf("Thread %d acquires mutex %d after %d ms - %s:%d\n", tid, arg1, mutex_wait_ms, self->file, self->line);
  self->file = NULL;
  self->line = NULL;
}

When we run this script against the Ruby program, we then get information like this:

$ sudo dtrace -q -s mutex.d -c 'ruby mutex.rb'

Thread 286592 acquires mutex 4313316240 after 2 ms - mutex.rb:14
Thread 286591 acquires mutex 4313316240 after 4004183 ms - mutex.rb:6
Thread 286592 acquires mutex 4313316240 after 2004170 ms - mutex.rb:14
Thread 286592 acquires mutex 4313316240 after 6 ms - mutex.rb:14
Thread 286592 acquires mutex 4313316240 after 4 ms - mutex.rb:14
Thread 286592 acquires mutex 4313316240 after 4 ms - mutex.rb:14
Thread 286591 acquires mutex 4313316240 after 16012158 ms - mutex.rb:6
Thread 286592 acquires mutex 4313316240 after 2002593 ms - mutex.rb:14
Thread 286591 acquires mutex 4313316240 after 4001983 ms - mutex.rb:6
Thread 286592 acquires mutex 4313316240 after 2004418 ms - mutex.rb:14
Thread 286591 acquires mutex 4313316240 after 4000407 ms - mutex.rb:6
Thread 286592 acquires mutex 4313316240 after 2004163 ms - mutex.rb:14
Thread 286591 acquires mutex 4313316240 after 4003191 ms - mutex.rb:6
Thread 286591 acquires mutex 4313316240 after 2 ms - mutex.rb:6
Thread 286592 acquires mutex 4313316240 after 4005587 ms - mutex.rb:14
...

We can collect this output and use it to derive information about which mutexes are causing the most contention.

Reck
  • 7,966
  • 2
  • 20
  • 24