0

I Have a Parallel.Foreach Loop creating Binary Readers on the same group of large Data Files
I was just wondering if it hurts performance that these readers are reading the same files in a Parallel Fashion (i.e, if they were reading exclusively different files would it go faster ?)
I am asking because there is a lot of I/O Disk access involved (I guess...)

Edit : I forgot to mention : I am using an Amazon EC2 instance and data is on the C:\ Disk assigned to it. I have no Idea how it affects this issue.

Edit 2: I'll make measurements duplicating the data folder and reading from 2 different sources and see what it gives.

Mehdi LAMRANI
  • 11,289
  • 14
  • 88
  • 130
  • 1
    I'd imagine the answer would be the same as mine was [here](http://stackoverflow.com/questions/8470306/multithreaded-file-compare-performance). – M.Babcock Mar 11 '12 at 21:54

2 Answers2

3

It's not a good idea to read from the same disk using multiple threads. Since the disk's mechanical head needs to spin every time to seek the next reading location, you are basically bouncing it around with multiple threads, thus hurting performance.

The best approach is actually to read the files sequentially using a single thread and then handing the chunks to a group of threads to process them in parallel.

Tudor
  • 61,523
  • 12
  • 102
  • 142
  • This is VERY Bad News my friend. I have to re-write the whole application core :((( – Mehdi LAMRANI Mar 12 '12 at 13:13
  • Doesn't really change anything. If it's a single mechanical disk you shouldn't expect any speedup from reading with multiple threads. – Tudor Mar 12 '12 at 15:21
2

It depends on where your files are. If you're using one mechanical hard-disk, then no - don't read files in parallel, it's going to hurt performance. You may have other configurations, though:

  • On a single SDD, reading files in parallel will probably not hurt performance, but I don't expect you'll gain anything.
  • On two mirrored disks using RAID 1 and a half-decent RAID controller, you can read two files at once and gain considerable performance.
  • If your files are stored on a SAN, you can most definitely read a few at a time and improve performance.

    You'll have to try it, but you have to be careful with this - if the files aren't large enough, the OS caching mechanisms are going to affect your measurements, and the second test run is going to be really fast.

zmbq
  • 38,013
  • 14
  • 101
  • 171