1

During executing my multithreading program only 4 of the available 8 CPU's are being used. Why? What can I change to make all CPU's working?

  Parallel.ForEach(0, CalcList.Count-1)
  .NumTasks(nMax)
  .NoWait
  .Execute(
    procedure(const value: integer)
    begin
      CalcUnit.EntrySearch(value);
    end);

(nMax and the CalcList.Count are both 16, Intel I7 HyperThreaded)

Thank you

ain
  • 22,394
  • 3
  • 54
  • 74
Frits Molenkamp
  • 175
  • 1
  • 10
  • 2
    FWIW hyper threading doesn't help much if at all for cpu bound tasks. – David Heffernan Nov 06 '14 at 18:43
  • Isn't there an option in OTL which alows you to limit the OTL to only use physical cores and not virtual ones? – SilverWarior Nov 06 '14 at 19:25
  • @SilverWarior The OP is talking about 8 cores (16 hyper threaded) but only 4 of the cores are being used. Although it might be a good thing to check if the CPU really has 4 or 8 cores. – Graymatter Nov 06 '14 at 19:55
  • If I look at your other question your EntrySearch seems to be accessing the file system. Is that correct? If so you will be very unlikely to get anywhere close to maximum CPU usage as your process will be IO bound on the drive. – Graymatter Nov 06 '14 at 20:10
  • Yes, what does EntrySearch do? Try replacing it with code that just performs aimless arithmetic. – David Heffernan Nov 06 '14 at 20:28
  • @GrayMatter the Intel i7 has 4 cpu's hypertreaded to eight thank you – Frits Molenkamp Nov 06 '14 at 21:13
  • @GrayMatter Thanks for your comment. Yes every thread get its own data file. and I understand that I don't expect the maximum CPU usage. But I do expect to see all the 8 cpu's working. – Frits Molenkamp Nov 06 '14 at 21:24
  • @SilverWarior Thank you for your comments. I don't know if there is an switch in OTL. I studied the OTL "test-36 ParallelAggreate" I see my 8 CPU's working but I can't find anything about a "physical and/or virtual cpu switch" – Frits Molenkamp Nov 06 '14 at 21:28
  • @FritsMolenkamp I would test the code with a strictly CPU bound operation and see what happens. That way you rule out the IO as a bottleneck. Also, setting your tasks to 16 doesn't make any sense as that indicates that you want to run 16 threads concurrently. In a 4 core / 8 hyper environment that will result in task switching which is completely counter productive. – Graymatter Nov 06 '14 at 22:32
  • 1
    Please listen to us. If your work is I/O bound then throwing more CPUs at it won't help and will most likely just make your program slower. – David Heffernan Nov 06 '14 at 22:51
  • 1
    @FritsMolenkamp The OS won't bother giving your program all 8 CPUs if it doesn't need them due to IO bottleneck. – Disillusioned Nov 07 '14 at 15:56
  • @DavidHeffernan Thanks for your comments. I don't see the I/O bottleneck. Every thread get is on file. If the program goes from single threaded to multithreaded, it can add 3 more file for parallel calculations then why not 8 files. Where can I find some documentation about this? thank You. – Frits Molenkamp Nov 08 '14 at 09:26
  • @CraigYoung, Thanks for your comments. Where can I find more info about the I/O bottleneck? thanks – Frits Molenkamp Nov 08 '14 at 09:28
  • Your program is I/O bound rather than CPU bound. Threading won't help. – David Heffernan Nov 08 '14 at 09:29
  • @DavidHeffernan thanks for your fast comment. Do you think I should quit Multithreading for my program due to the I/O bound or is there another way to solve this? Thanks again. – Frits Molenkamp Nov 08 '14 at 09:50
  • If the perf is dominated by I/O then threading won't improve perf – David Heffernan Nov 08 '14 at 09:51
  • @DavidHeffernan for now I get a disappointing performance increase of 200% – Frits Molenkamp Nov 08 '14 at 10:38
  • How would you expect more CPUs to speed up reading from a disk? – David Heffernan Nov 08 '14 at 10:42
  • @DavidHeffernan, For test on Data Reading speed I moved the Data files to a SSD, surprisingly I get a performance decrease of 20% compared to data files on a mechanical WD HDD. – Frits Molenkamp Nov 08 '14 at 11:25
  • Testing disk access is tricky. You have to account for the cache. If you are working on the same files over and over, the system will just cache them in memory. If they are not in the cache then your process will be even more I/O bound! – David Heffernan Nov 08 '14 at 11:26
  • @DavidHeffernan the data files readings loop in one thread about 350 times, so the in cache memory should be no issue. but looking at the heavy reading you are right, I guess I should rest my case. Thanks for your help. How do I close my Question at Stack Exchange? – Frits Molenkamp Nov 08 '14 at 11:46
  • It's already closed. You could accept Graymatter's answer which I believe to be on the money. – David Heffernan Nov 08 '14 at 11:57

1 Answers1

1

I just did a test on an i7 2600 (4 cores 8 HT) using OTL. A simple Parallel.ForEach loop makes use of all 8. With and without the .NumTasks that you have. There is no problem with the library.

begin
  Parallel.ForEach(0, 100)
  //.NumTasks(16)
  .Execute(
    procedure(const value: integer)
    var
      newValue: Single;
      I: Integer;
    begin
      newValue := value;
      for I := 1 to 100000000 do
      begin
        newValue := newValue * I;
        newValue := newValue / I;
      end;
    end);
  ShowMessage('Done!');
end;

My guess is that the problem is in your code. Disk accesses in threads are a good way to counter the benefits of using threads in the first place.

I don't know enough about your code but you should rather look at reading in the data in a single thread and then threading the actual processing of that data.

I see that you also have .NoWait specified. Are you saving the return value for your Parallel.ForEach? Its a good idea to save this value because otherwise your code will block when the OnClick exits. See gabr's answer to this question.

Why is OmniThreadLibrary's ForEach blocking main thread?

Community
  • 1
  • 1
Graymatter
  • 6,529
  • 2
  • 30
  • 50
  • Thanks for you comments and code. Removing the .NumTasks(nMAx) in my code slows down my CPU usage. the Yes the problem is in my code, Your code example makes all my 8 CPU's run 100%. I am trying indeed to multithread the 'old' single thread data calculations. thanks to former comments I am trying to learn more abut the I/O bottleneck. Thanks again – Frits Molenkamp Nov 08 '14 at 09:34