Using Ruby threads, but not seeing a speed up

Question

I'm attempting to write a multi-threaded solution for Project Euler's problem 14, but I'm not really seeing a speed up. There isn't any shared resources and no Mutex locks are being used... Is my code slow because of context switches? Am I not correctly understanding the benefit of threads?

http://projecteuler.net/problem=14

require 'benchmark'

benchmark_results = Benchmark.measure do
  threads = []
  num_threads = 10

  num_threads.times do |thread_num|
    threads << Thread.new(thread_num + 1) do |thread_num|
      Thread.current["max_length"] = 0
      (thread_num..1000000).step(num_threads).each do |i|
        next if i.even?
        current = i
        length = 0

        until current == 1
          if current.even?
            current = current / 2
          else
            current = current * 3 + 1
          end
          length += 1
        end

        if length > Thread.current["max_length"]
          Thread.current["max_length"] = length
          Thread.current["max_i"] = i
        end
      end
    end
  end

  threads.each { |thread| thread.join; print "#{thread['max_i']} -> #{thread['max_length']}\n" }
end

puts benchmark_results

Wasn't MRI Ruby not able to use actual multiple system threads do to GIL? As far as I know real multithreading is only possible with JRuby or other implementations. — Peterdk, Dec 21 '13 at 00:35
@Peterdk Depends .. http://stackoverflow.com/questions/56087/does-ruby-have-real-multithreading — user2864740, Dec 21 '13 at 00:41

score 2 · Answer 1 · answered Dec 21 '13 at 00:42

As far as I know, most ruby implementations don't use real threads (OS level) or use them only with some kind of lock, so such implementations won't be able to benefit from multiple cores/processor threads. (See http://en.wikibooks.org/wiki/Ruby_Programming/Reference/Objects/Thread )

That should effectively prevent you from reaping benefits from threads if your application is CPU bound. If it is IO bound, on the other hand, then threads might help. (Then, while one green thread waits for IO, other green threads can use your alloted CPU time. Physically, however, there's still only one CPU core being used).

I once used ruby (1.9.x) threads in a web scraping script, yielding a great speedup. While one thread was downloading (IO), others were parsing HTML and looking for other things to download. I wouldn't expect any speedup in an app that does math, though. — Petr Skocik, Dec 21 '13 at 00:47

score 1 · Accepted Answer · edited May 23 '17 at 10:32

There are many different implementations of Ruby. The most referred is MRI (see: other question.

MRI has threads, but unfortunately uses only one CPU core at a time. That means: Only one thread will actually run at the time.

If your thread had to wait for IO to happen, there may be a speed up. Because if one thread has to wait, another thread can catch up. But your problem need the CPU all the time.

I would suggest investigate another Ruby implementation like JRuby for this kind of problem. JRuby has real threads.

Perhaps you will have a greater speed up, if you change your implementation. In the moment you recalculate every max_length over and over again. For example: The sequence length for n = 4 will be 3. If you calculate the length for n = 8, you do one step (n / 2) and than have a current of 4 and you will already know that n = 4 has length = 3: Therefore length(8) = 1 + length(4) = 1 + 4 = 5. Example:

class CollatzSequence

  def initialize
    @lengths = Hash.new { |h, n| cache_length(h, n) }
  end

  def length(n)
    @lengths[n]
  end

private

  def cache_length(h, n)
    if n <= 1
      h[n] = 1
    else
      next_in_seqence = n.even? ? (n / 2) : (n * 3 + 1)
      h[n] = 1 + h[next_in_seqence]
    end
  end

end

require 'benchmark'
sequencer = CollatzSequence.new

Benchmark.bm(10) do |bm| 
  bm.report('not cached')  { sequencer.length(837799)     } 
  bm.report('cache hit 1') { sequencer.length(837799)     } 
  bm.report('cache hit 2') { sequencer.length(837799 * 2) } 
end

#                  user     system      total        real
# not cached   0.000000   0.000000   0.000000 (  0.001489)
# cache hit 1  0.000000   0.000000   0.000000 (  0.000007)
# cache hit 2  0.000000   0.000000   0.000000 (  0.000011)

score 1 · Answer 3 · answered Dec 21 '13 at 00:53

1

All you need is one process to find the longest Collatz chain starting from an odd number less than 1000000, and another one to find the longest one starting from an even number less than 1000000. Running several instances of a script in separate cores isn't too difficult if you start them all manually. It's cheap and dirty, but it works :-) But they can't just be threads within a process, they must be separate processes. (I think that what I call "processes" are what ThorX89 calls "OS threads".)

answered Dec 21 '13 at 00:53

David Knipe

3,417
1
19
19

I wrote "real threads (OS level)", and by that I meant native threads, i.e. threads within a single process that are scheduled by the OS's kernel rather than by the ruby implementation within a single process. – Petr Skocik Dec 21 '13 at 01:07

score 0 · Answer 4 · answered Dec 21 '13 at 20:33

Apart from threading, there are other ways to speed this up.

I seem to recall Ruby is a particularly slow language, designed by a guy who didn't care about performance. Maybe you could use a different language.

More importantly: You're doing this the naive way. It works, but a lot of the calculations are repeated many times. For instance, you've got the following Collatz sequencs:

         7, 22, 11, 34, 17, 52, 26, 13, 40, 20, 10, 5, 16, 8, 4, 2, 1
                11, 34, 17, 52, 26, 13, 40, 20, 10, 5, 16, 8, 4, 2, 1
29, 88, 44, 22, 11, 34, 17, 52, 26, 13, 40, 20, 10, 5, 16, 8, 4, 2, 1

Most of the steps in the sequences (e.g. 52 -> 26) are calculated more than once. There's clearly room for some kind of optimisation here. You've done a bit of optimisation by ignoring the sequences that start with even numbers (BTW you forgot to correct for this when collating your results). I've found a faster way of doing this, and compared it with the naive approach. For the first 10,000 numbers instead of the first 1,000,000, the naive method took 63 seconds; the naive approach ignoring even numbers took 35 seconds; and my method took 5 seconds. For the full 1,000,000, my algorithm took 9 minutes; I didn't try running the others. (All of these were written in Perl.) If you don't want any more details about how I did it, look away now!

Instead of just calculating each result and then forgetting it, I made an array of results as I went along. Now suppose you've calculated all the Collatz sequence lengths up to 12. To calculate the length of the sequence starting from 13, you start from 13 and continue until you get to a number less than 13. The sequence goes 13, 40, 20, 10. You look at element 10 in the array and find that it's 6 steps from 10 to 1. You know it's 3 steps from 13 to 10, because you just did those steps. Therefore it's 9 steps from 13 to 1, so you set 9 as element 13 in the array.

There's no obvious/nice way to do this with threads. I think it would be possible to come up with something if you really wanted to, but it's not worth the effort. Maybe if they'd said a billion...

Just wanted to test threads out :D. Need to use memoization for an actual solution and theoretically, I would be able to use threads AND a shared resource with Mutex locks to get it to work... Seeing that there isn't a speed up with green threads, however, I've concluded there's not much of a point to use threads to solve this problem. — zilla, Dec 21 '13 at 21:30

Using Ruby threads, but not seeing a speed up

4 Answers4