6

When I first discovered threads, I tried checking that they actually worked as expected by calling sleep in many threads, versus calling sleep normally. It worked, and I was very happy.

But then a friend of mine told me that these threads weren't really parallel, and that sleep must be faking it.

So now I wrote this test to do some real processing:

class Test
  ITERATIONS = 1000

  def run_threads
    start = Time.now

    t1 = Thread.new do
      do_iterations
    end

    t2 = Thread.new do
      do_iterations
    end

    t3 = Thread.new do
      do_iterations
    end

    t4 = Thread.new do
      do_iterations
    end

    t1.join
    t2.join
    t3.join
    t4.join

    puts Time.now - start
  end

  def run_normal
    start = Time.now

    do_iterations
    do_iterations
    do_iterations
    do_iterations

    puts Time.now - start
  end

  def do_iterations
    1.upto ITERATIONS do |i|
      999.downto(1).inject(:*) # 999!
    end
  end
end

And now I'm very sad, because run_threads() not only didn't perform better than run_normal, it was even slower!

Then why should I complicate my application with threads, if they aren't really parallel?

** UPDATE **

@fl00r said that I could take advantage of threads if I used them for IO tasks, so I wrote two more variations of do_iterations:

def do_iterations
  # filesystem IO
  1.upto ITERATIONS do |i|
    5.times do
      # create file
      content = "some content #{i}"
      file_name = "#{Rails.root}/tmp/do-iterations-#{UUIDTools::UUID.timestamp_create.hexdigest}"
      file = ::File.new file_name, 'w'
      file.write content
      file.close

      # read and delete file
      file = ::File.new file_name, 'r'
      content = file.read
      file.close
      ::File.delete file_name
    end
  end
end

def do_iterations
  # MongoDB IO (through MongoID)
  1.upto ITERATIONS do |i|
    TestModel.create! :name => "some-name-#{i}"
  end
  TestModel.delete_all
end

The performance results are still the same: normal > threads.

But now I'm not sure if my VM is able to use all the cores. Will be back when I have tested that.

HappyDeveloper
  • 12,480
  • 22
  • 82
  • 117

5 Answers5

7

Threads could be faster only if you have got some slow IO.

In Ruby you have got Global Interpreter Lock, so only one Thread can work at a time. So, Ruby spend many time to manage which Thread should be fired at a moment (thread scheduling). So in your case, when there is no any IO it will be slower!

You can use Rubinius or JRuby to use real Threads.

Example with IO:

module Test
  extend self

  def run_threads(method)
    start = Time.now

    threads = []
    4.times do
      threads << Thread.new{ send(method) }
    end

    threads.each(&:join)

    puts Time.now - start
  end

  def run_forks(method)
    start = Time.now

    4.times do
      fork do
        send(method)
      end
    end
    Process.waitall

    puts Time.now - start
  end

  def run_normal(method)
    start = Time.now

    4.times{ send(method) }

    puts Time.now - start
  end

  def do_io
    system "sleep 1"
  end

  def do_non_io
    1000.times do |i|
      999.downto(1).inject(:*) # 999!
    end
  end
end

Test.run_threads(:do_io)
#=> ~ 1 sec
Test.run_forks(:do_io)
#=> ~ 1 sec
Test.run_normal(:do_io)
#=> ~ 4 sec

Test.run_threads(:do_non_io)
#=> ~ 7.6 sec
Test.run_forks(:do_non_io)
#=> ~ 3.5 sec
Test.run_normal(:do_non_io)
#=> ~ 7.2 sec

IO jobs are 4 times faster in Threads and Processes while non-IO jobs in Processes a twice as fast then Threads and sync methods.

Also in Ruby presents Fibers lightweight "corutines" and awesome em-synchrony gem to handle asynchronous processes

fl00r
  • 82,987
  • 33
  • 217
  • 237
5

fl00r is right, the global interpretor lock prevents multiple threads running at the same time in ruby, except for IO.

The parallel library is a very simple library that is useful for truly parallel operations. Install with gem install parallel. Here is your example rewritten to use it:

require 'parallel'
class Test
  ITERATIONS = 1000

  def run_parallel()
    start = Time.now

    results = Parallel.map([1,2,3,4]) do |val|
        do_iterations
    end

    # do what you want with the results ...
    puts Time.now - start
  end

  def run_normal
    start = Time.now

    do_iterations
    do_iterations
    do_iterations
    do_iterations

    puts Time.now - start
  end

  def do_iterations
    1.upto ITERATIONS do |i|
      999.downto(1).inject(:*) # 999!
    end
  end
end

On my computer (4 cpus), Test.new.run_normal takes 4.6 seconds, while Test.new.run_parallel takes 1.65 seconds.

David Miani
  • 14,518
  • 2
  • 47
  • 66
  • Wow I didn't know about that gem. I'll give it a try – HappyDeveloper Apr 19 '12 at 11:17
  • 3
    @HappyDeveloper Just be careful, it will spawn processes by default with pipe as exchange mechanism. It's not a thread and it's not lightweight. And I doubt you have any advantage if you use `:in_threads` option with normal Ruby. – Victor Moroz Apr 19 '12 at 14:24
4

The behavior of threads is defined by the implementation. JRuby, for example, implements threads with JVM threads, which in turn uses real threads.

The Global Interpreter Lock is only there for historic reasons. If Ruby 1.9 had simply introduced real threads out of nowhere, backwards compatibility would have been broken, and it would have slowed down its adoption even more.

This answer by Jörg W Mittag provides an excellent comparison between the threading models of various Ruby implementations. Choose one which is appropriate for your needs.

With that said, threads can be used to wait for a child process to finish:

pid = Process.spawn 'program'
thread = Process.detach pid

# Later...
status = thread.value.exitstatus
Community
  • 1
  • 1
Matheus Moreira
  • 17,106
  • 3
  • 68
  • 107
2

Even if Threads don't execute in parallel they can be a very effective, simple way of accomplishing some tasks, such as in-process cron-type jobs. For example:

Thread.new{ loop{ download_nightly_logfile_data; sleep TWENTY_FOUR_HOURS } }
Thread.new{ loop{ send_email_from_queue; sleep ONE_MINUTE } }
# web server app that queues mail on actions and shows current log file data

I also use Threads in a DRb server to handle long-running calculations for one of my web applications. The web server starts a calculation in a thread and immediately continues responding to web requests. It can periodically peek in on the status of the job and see how it's progressing. For more details, read DRb Server for Long-Running Web Processes.

Phrogz
  • 296,393
  • 112
  • 651
  • 745
1

For a simple way to see the difference, use Sleep instead of the IO which also relies on too many variables:

class Test


ITERATIONS = 1000

  def run_threads
    start = Time.now
    threads = []

    20.times do
      threads << Thread.new do
        do_iterations
      end
    end

    threads.each {|t| t.join } # also can be written: threads.each &:join

    puts Time.now - start
  end

  def run_normal
    start = Time.now

    20.times do
      do_iterations
    end

    puts Time.now - start
  end

  def do_iterations
    sleep(10)
  end
end

this will have a difference between the threaded solution even on MRB, with the GIL

Liorsion
  • 582
  • 5
  • 9