8

I have an Intel Core 2 Duo CPU and i was reading 3 files from my C: drive and showing some matching values from the files onto a EditBox on Screen.The whole process takes 2 minutes.Then I thought of processing each file in a separate thread and then the whole process is taking 2.30 minutes !!! i.e 30 seconds more than single threaded processing.

I was expecting the other way around !I can see both the Graphs in CPU usage history.Some one please explain to me what is going on ? here is my code snippet.

 foreach (FileInfo file in FileList)
{

   Thread t  = new Thread(new ParameterizedThreadStart(ProcessFileData));
   t.Start(file.FullName);  

}

where processFileData is the method that process the files.

Thanks!

James Black
  • 41,583
  • 10
  • 86
  • 166
  • I am not certain if you can check, but if both threads are on the same core then you won't see much improvement. Have you profiled your single-threaded and multi-threaded application, to see what is going on? – James Black Nov 16 '09 at 04:56
  • How do you profile the threads ? –  Nov 17 '09 at 03:38
  • Are you running your speed tests with Release builds? – Warpin Nov 16 '09 at 04:48

4 Answers4

12

The root of the problem is that the files are on the same drive and, unlike your dual core processor, your hard drive can only do one thing at a time.

If you read two files simultaneously, the disk heads will jump from one file to the other and back again. Given that your hard drive can read each file in roughly 40 seconds, it now has the additional overhead of moving its disk head between the three separate files many times during the read.

The fastest way to read multiple files from a single hard drive is to do it all in one thread and read them one after another. This way, the head only moves once per file read (at the very beginning) and not multiple times per read.

To optimize this process, you'll either need to change your logic (do you really need to read the whole contents of all three files?). Or purchase a faster hard drive/put the 3 files in three different hard drives and use threading/use a raid.

Michael La Voie
  • 27,772
  • 14
  • 72
  • 92
  • Would performance increase if each file is read and placed into a string, then did the matching against those threads? – Omar Nov 16 '09 at 05:23
  • Sure, if you were reading the files multiple times to perform a match, then definitely put them in memory and use an many cores as you have to search them. However, if you are only searching them once and can stop if a match is found, then it will be much faster to try matching as you read so you can stop the read if a match is found. Reading from HD is about 1000X slower than from ram, so if you can stop reading the file part way through if a match is found, then that is a huge time saving. – Michael La Voie Nov 16 '09 at 06:21
  • Has anyone tried multi-threading disk IO to a solid state drive? – IAbstract Jun 08 '11 at 14:30
3

If you read from disk using multiple threads, then the disk heads will bounce around from one part of the disk to another as each thread reads from a different part of the drive. That can reduce throughput significantly, as you've seen.

For that reason, it's actually often a better idea to have all disk accesses go through a single thread, to help minimize disk seeks.

If your task is I/O bound and if it needs to run often, you might look at a tool like "contig" to make sure the layout of your files on disk is optimized / contiguous.

RickNZ
  • 18,448
  • 3
  • 51
  • 66
1

If you processing is mostly IO bound and CPU bound it make sense it take same time or even more.

How do you compare those files ? You should think what is the bottleneck of you application? IO output/input, CPU, memory ...

The multithreading is only interesting for CPU bound processing. i.e. complex calculation, comparison of data in memory, sorting etc ...

RageZ
  • 26,800
  • 12
  • 67
  • 76
0

Since your process is IO bound, you should let the OS do your threading for you. Look at FileStream.BeginRead() for an example how to queue up your reads. Your EndRead() method can spin up your next request to read your next block of data pointing to itself to handle each subsequent completed block.

Also, with you creating additional threads, the OS has to manage more threads. And if a different CPU happens to get picked to handle the completed read, you've lost all of the CPU caching where your thread originated.

As you've found, you can't "speed up" an application just by adding threads.

No Refunds No Returns
  • 8,092
  • 4
  • 32
  • 43