Why does #join on a Thread object work differently when called with an iterator than with a loop?

Question

Applying #join on Thread objects inside a loop executes them sequentially.

5.times do |x|
  Thread.new {
    t= rand(1..5) * 0.25
    sleep(t)
    puts "Thread #{x}:  #{t} seconds"
   }.join
end

# Output
# Thread 0:  1.25 seconds
# Thread 1:  1.25 seconds
# Thread 2:  0.5 seconds
# Thread 3:  0.75 seconds
# Thread 4:  0.25 seconds

On the other hand, applying #join to an array of Thread objects with an iterator executes them concurrently. Why?

threads = []

5.times do |x|
  threads << Thread.new {
    t = rand(1..5) * 0.25
    sleep(t)
    puts "Thread #{x}:  #{t} seconds"
  }
end

threads.each(&:join)

# Output
# Thread 1:  0.25 seconds
# Thread 3:  0.5 seconds
# Thread 0:  1.0 seconds
# Thread 4:  1.0 seconds
# Thread 2:  1.25 seconds

If you call `join` within the loop, it blocks your code at that point, waits for the thread to finish, and then continues the loop. — Stefan, Dec 09 '21 at 13:37
@Stefan what happens when I call while I am iterating the array? That doesn't block the array iteration? I am just trying to understand. — Rajagopalan, Dec 09 '21 at 14:20
@Rajagopalan you mean `threads.each(&:join)`? That `join` also blocks until the 1st thread has finished, then blocks until the 2nd has finished and so on. However, since all threads have already been started, they can run concurrently. — Stefan, Dec 09 '21 at 15:11
It _never_ makes sense to "join" a thread immediately after creating it. The only reason for ever creating a thread is if the caller is going to do something else while the new thread is running. In your second example, the "something else" that the caller does is, it creates more threads. — Solomon Slow, Dec 09 '21 at 15:26
@Rajagopalan, I submitted my own answer. The explanations given by Stefan and Solomon Slow helped, but I still hadn't quite grasped the concept. Hope it helps you too. — Nadim Hussami, Dec 11 '21 at 15:21

Nadim Hussami · Accepted Answer · 2021-12-11T15:41:09.067

There are several points to address here.

When a thread starts

Instantiating Thread with #new, #start, #fork immediately starts that thread's code. This runs concurrently with the main thread. However, when calling a thread inside a short script without 'joining' it, the main thread typically ends before the called thread has a chance to finish. To the amateur programmer, it gives the false impression that #join starts the thread.

thread = Thread.new {
   puts "Here's a thread"
}

# (No output)

Adding a short delay to the calling main thread gives the called thread a chance to finish.

thread = Thread.new {
   puts "Here's a thread"
}

sleep(2)

# Here's a thread

What #join actually does

#join blocks the main thread, and only the calling thread, until the called thread is completed. Any previously called threads are not affected; they have been running concurrently and continue to do so.

The original examples explained

In the first example, the loop starts a thread, and immediately 'joins' it. Since #join blocks the main thread, the loop is paused until the first thread is completed. Then the loop iterates, starts a second thread, 'joins' it, and pauses the loop once again until this thread is completed. It's purely sequential and completely negates the point of threads.

5.times do |x|
  Thread.new {
    t= rand(1..5) * 0.25
    sleep(t)
    puts "Thread #{x}:  #{t} seconds"
   }.join                             # <--- this #join is the culprit.
end

User Solomon Slow put it best in his comment in the original post.

It never makes sense to "join" a thread immediately after creating it. The only reason for ever creating a thread is if the caller is going to do something else while the new thread is running. In your second example, the "something else" that the caller does is, it creates more threads.

The second example does multithreading right. The loop starts a thread, iterates, starts the next thread, iterates, and so on. Because we haven't used #join inside the loop, the main thread keeps iterating and starts all the threads.

So how does using #join in an iterator not pose the same problem as the first example? Because these threads have already been running concurrently. Remember #join only blocks the main thread until the 'joined' thread is complete. This called thread and all other called threads have been running since the loop that created them, and they will continue to run and finish independently of the main thread and of each other. 'Joining' all threads sequentially just tells the main thread:

Don't continue until Thread 1 is done (but it's possible this thread, and some, all, or none of the other threads may have already finished).
Don't continue until Thread 2 is done (but it's possible this thread, and some, all, or none of the remaining threads may have already finished).
...
Don't continue until Thread 5 is done (but it's possible this thread has already finished, while all remaining threads have definitely already finished).

In effect this last line sequentially instructs the main thread to pause, but it does not hinder the called threads.

threads.each(&:join)

I also found this explanation very helpful.

That's the best explanation, I use thread as well but still did not grasp it properly. Best research and well written, up voted. — Rajagopalan, Dec 11 '21 at 15:32

Why does #join on a Thread object work differently when called with an iterator than with a loop?

1 Answers1