I am writing a program that loads data from four XML files into four different data structures. It has methods like this:
def loadFirst(year)
File.open("games_#{year}.xml",'r') do |f|
doc = REXML::Document.new f
...
end
end
def loadSecond(year)
File.open("teams_#{year}.xml",'r') do |f|
doc = REXML::Document.new f
...
end
end
etc...
I originally just used one thread and loaded one file after another:
def loadData(year)
time = Time.now
loadFirst(year)
loadSecond(year)
loadThird(year)
loadFourth(year)
puts Time.now - time
end
Then I realized that I should be using multiple threads. My expectation was that loading from each file on a separate thread would be very close to four times as fast as doing it all sequentially (I have a MacBook Pro with an i7 processor):
def loadData(year)
time = Time.now
t1 = Thread.start{loadFirst(year)}
t2 = Thread.start{loadSecond(year)}
t3 = Thread.start{loadThird(year)}
loadFourth(year)
t1.join
t2.join
t3.join
puts Time.now - time
end
What I found was that the version using multiple threads is actually slower than the other. How can this possibly be? The difference is around 20 seconds with each taking around 2 to 3 minutes.
There are no shared resources between the threads. Each opens a different data file and loads data into a different data structure than the others.