1

I have a function that does IO/computation. I made a demo function which copies ~300MB from here to there. If I run it in a thread which I immediately join, it is much slower than if I run it without a thread. I checked with:

def cp
  start = Time.now
  FileUtils.cp_r("C:/tmp", "C:/tmp1")
  fin = Time.now - start
  p fin
end

Comparing these:

cp

Thread.new{cp}.join

the first cp call is always two to four times faster than the threaded call. The same happens if I do

cp

Thread.new{cp}

sleep 200

I heard about GIL, etc., but here, only one thread runs at a time, so no race for running time. Any ideas on how I can make it faster or why that is happening?

Darshan Rivka Whittle
  • 32,989
  • 7
  • 91
  • 109
Roman Smelyansky
  • 319
  • 1
  • 13
  • I can not reproduce the speed difference. The version called from a separate thread takes roughly the same time as called from the main process. Linux x86_64, Ruby 1.9.3p429 . I'd also add file operations heavily depend on the underlaying operating system and its caching capabilities. The cache has to be cleared/invalidated between calls. – Torimus May 19 '13 at 16:07
  • 1
    The OP is on Windows, that might be causing speed differences due to its threading. I haven't dug into that as I quit developing and running on Windows years ago, but I seem to remember it doesn't support threads like *nix systems. – the Tin Man May 19 '13 at 16:21
  • Wait... are you benchmarking both the plain and with-thread version in the same run? Those need to be separate runs, so that you can isolate the effects of file cache & etc. – Wayne Conrad May 19 '13 at 17:20

1 Answers1

1

Threading isn't a guarantee that things will run faster, or even the same speed, as non-threaded code, at least currently with MRI. JRuby might be better. Your cp isn't getting the full attention of the CPU, which is why doing it without threading, and allowing it to block until done, is faster.

Consider using fork instead.

"A dozen (or so) ways to start sub-processes in Ruby: Part 1" looks useful. Also "How do you spawn a child process in Ruby?".

Community
  • 1
  • 1
the Tin Man
  • 158,662
  • 42
  • 215
  • 303
  • Hey, thanks for the answer, but 2 things arise then..: When I did join on thread it was also as slow as before, how can that be explained? How can I track the status of my 'forked' threads? – Roman Smelyansky May 19 '13 at 15:57
  • "How can I track the status of my 'forked' threads?"? Use `Process.wait`? – the Tin Man May 19 '13 at 16:04
  • Sorry, what I meant is that it is impossible to have lets say some dictionary which is updated by threads as they go – Roman Smelyansky May 19 '13 at 16:07
  • No, it's not impossible to modify something by threads. That's what [`Queue`](http://www.ruby-doc.org/stdlib-2.0/libdoc/thread/rdoc/Queue.html) is for. – the Tin Man May 19 '13 at 16:18
  • I talked about achieving this behavior using forked processes. It is impossible to change one critical object, thus it is not sufficient to us fork – Roman Smelyansky May 19 '13 at 16:49
  • Sigh. You said "updated by threads". That's not forked. It's important to use the right terms. Another method to consider is the "block" form of `Open3.popen3` and call a sub-script or the built-in `copy` or `cp` . You can write/read using stdin/stdout to that sub-shell. Be aware of the need to close the `stdin` talking to the sub-shell as that can help avoid hanging in some apps. Because you're on Windows there might be limitations to what's available to you. – the Tin Man May 19 '13 at 17:09
  • Thanks, Tin Man :), Cant rep you just jet. – Roman Smelyansky May 20 '13 at 06:14