0

I'm working with 6 csv files that each contain the attributes of an object. I can read them one at a time, but the idea of splitting each one to a thread to do in parallel is very appealing.

I've created a database object (no relational DBs or ORMs allowed) that has an array for each of the objects it is holding. I've tried the following to make each CSV open and initialize concurrently, but have seen no impact on speed.

threads = []
  CLASS_FILES.each do |klass, filename|
    threads << Thread.new do
      file_to_objects(klass, filename)
    end
  end
  threads.each {|thread| thread.join}
  update
end
def self.load(filename)
  CSV.open("data/#{filename}", CSV_OPTIONS)
end

def self.file_to_objects(klass, filename)
  file = load(filename)
  method_name = filename.sub("s.csv","")
  file.each do |line|
    instance = klass.new(line.to_hash)
    Database.instance.send("#{method_name}") << instance
  end
end

How can I speed things up in ruby (MRI 1.9.3)? Is this a good case for Rubinius?

Chris
  • 11,819
  • 19
  • 91
  • 145

1 Answers1

2

Even though Ruby 1.9.3 uses native threads in order to implement concurrency, it has a global interpreter lock which makes sure only one thread executes at a time.

Therefore, nothing really runs in parallel in C Ruby. I know that JRuby imposes no internal lock on any thread, so try using it to run your code, if possible.

This answer by Jörg W Mittag has a more in-depth look at the threading models of the several Ruby implementations. It isn't clear to me whether Rubinius is fit for the job, but I'd give it a try.

Community
  • 1
  • 1
Matheus Moreira
  • 17,106
  • 3
  • 68
  • 107