1

I am trying to understand processes with Ruby. I am creating 4 child processes from my parent process. The main process starts by writting to a file, then creates the child processes, each of them writte to the same file :

require 'csv'
a = [1, 2, 3, 4]
CSV.open("temp_and_cases_batch_parallel.csv", "ab") do |target_file|
  target_file << ["hello from parent process #{Process.pid}"]
  a.each do |num|
    pid = Process.fork do
      target_file << ["hello from child Process #{Process.pid}"]
    end
    puts "parent, pid #{Process.pid}, waiting on child pid #{pid}"
  end
end
Process.wait
puts "parent exiting"

The file output I expect

hello from parent process 3336
hello from child Process 3350
hello from child Process 3351
hello from child Process 3349
hello from child Process 3352

The file output I actually get :

hello from parent process 3336
hello from parent process 3336
hello from child Process 3350
hello from parent process 3336
hello from child Process 3351
hello from parent process 3336
hello from child Process 3349
hello from parent process 3336
hello from child Process 3352

seems like the insert from the parent process is rerun 5 times. How is that possible ? what is going on here ?

David Geismar
  • 3,152
  • 6
  • 41
  • 80

1 Answers1

5

Having multiple processes write to the same file is usually not a good idea. In most cases, unless you absolutely know what you are doing, the result will be unpredictable, as you just demonstrated with your example.

The reason why you get your strange result, is that the Ruby IO object has its own internal buffer. This buffer is kept in memory, and is NOT guaranteed to be written to disk when you call <<.

What happens here is that the string hello from parent only gets written to the internal buffer, and not to the disk. Then when you call fork, you will be copying this buffer into the child. Then the child will append hello from child to the buffer, and only THEN will the buffer be flushed to disk.

The result is that all children will write hello from parent, in addition to writing hello from child, because this is what the internal memory buffer will contain by the time Ruby decides to write the buffer to disk.

To get around this problem you can call IO.flush before forking, to ensure the memory buffer is empty and gets flushed to disk before forking. This ensures that the buffer is empty in the child, and you will now get your expected output:

CSV.open(...) do |target_file|
  target_file << ...
  target_file.flush  # <-- Make sure the internal buffer is flushed to disk before forking

  a.each do |num|
    ... Process.fork ...
  end
end
...
Casper
  • 33,403
  • 4
  • 84
  • 79