-1

I have a program that reviews multiple files and then replaces certain strings (usually 100-200 files, each not more than 10-20kb).

I would have thought that a Parallel ForEach would be the way to go here, but that is actually slower than a good 'ol serial ForEach. Any idea why?

Matt Cashatt
  • 23,490
  • 28
  • 78
  • 111
  • 1
    Possible duplicate of [How to properly parallelise job heavily relying on I/O](http://stackoverflow.com/questions/8505815/how-to-properly-parallelise-job-heavily-relying-on-i-o) – Preston Guillot Nov 02 '15 at 00:13
  • 1
    When the operation reaches the hardware, there's still only one disk. Task switching across CPU cores is easy, but the overhead on a disk is costly. – David Nov 02 '15 at 00:14
  • @David Awesome answer, thanks! – Matt Cashatt Nov 02 '15 at 00:15
  • 2
    @MatthewPatrickCashatt Take a breath. Preston's comment is automatically generated by the system as a result of his close vote. Nobody else can see the `Question may already have an answer...` bit except you - its only purpose is to prompt you that someone thought this *might* answer your question. If it does, you can accept it as having answered your question, if not, then don't worry about it. What's surprising is that you've been here over four years, 5k rep, and this is news to you... Keep calm, carry on. – J... Nov 02 '15 at 00:43
  • 1
    That said, it's probably worth reading the link that Preston provided just the same - it may not answer your question, but if you're surprised that parallelizing disk I/O doesn't provide any performance gains then it's probably got a lot of information that you could learn from (which is, after all, why we are all here). – J... Nov 02 '15 at 00:48
  • @J...--Actually it's my tenure here that is making me increasingly impatient with people voting to close a question a nanosecond after it is posted. What is unfortunate is that this question has a direct answer that wasn't out there already and it is now being drowned out by everyone but the guy that had the answer. We are here to get answers; not take lessons. – Matt Cashatt Nov 02 '15 at 00:58

1 Answers1

0

To Quote @David in the comments on the OP:

When the operation reaches the hardware, there's still only one disk. Task switching across CPU cores is easy, but the overhead on a disk is costly.

Matt Cashatt
  • 23,490
  • 28
  • 78
  • 111
  • This does really make me wonder though... As I was wording the comment, I almost used terms like "drive arm" or "read/write head", which are increasingly aging terms. I wonder if ongoing development of solid state architectures will not long result in multiple I/O channels on devices. Sort of the non-volatile storage medium's equivalent of multiple CPU cores. Even with SSD, the storage is still a significant bottleneck in high-performance systems. But getting away from the limitations of moving parts really reduces the potential of that barrier. – David Nov 02 '15 at 01:45
  • @David--Funny you should say that; I just ordered a SSD from Amazon in hopes to alleviate the bottleneck. Thanks again! – Matt Cashatt Nov 02 '15 at 01:51