0

I understand a Barrier can be used to have several tasks synchronise their completion before a second phase runs.

I would like to have several tasks synchronise multiple steps like so:

state is 1;
Task1 runs and pauses waiting for state to become 2;
Task2 runs and pauses waiting for state to become 2;
Task2 is final Task and causes the state to progress to state 2;
Task1 runs and pauses waiting for state to become 3;
Task2 runs and pauses waiting for state to become 3;
Task2 is final Task and causes the state to progress to state 3;
state 3 is final state and so all tasks exit.

I know I can spin up new tasks at the end of each state, but since each task does not take too long, I want to avoid creating new tasks for each step.

I can run the above synchronously using for loops, but final state can be 100,000, and so I would like to make use of more than one thread to run the process faster as the process is CPU bound.

I have tried using a counter to keep track of the number of completed Tasks that is incremented by each Task on completion. If the Task is the final Task to complete then it will change the state to the next state. All completed Tasks then wait using while (iterationState == state) await Task.Yield but the performance is terrible and it seems to me a very crude way of doing it.

What is the most efficient way to get the above done? There must be an optimised tool to get this done?

I'm using Parallel.For, creating 300 tasks, and each task needs to run through up to 100,000 states. Each task running through one state completes in less than a second, and creating 300 * 100,000 tasks is a huge overhead that makes running the whole thing synchronously much faster, even if using a single thread.

So I'd like to create 300 Tasks and have these Tasks synchronise moving through the 100,000 states. Hopefully the overhead of creating only 300 tasks instead of 300 * 100,000 tasks, with the overhead of optimised synchronisation between the tasks, will run faster than when doing it synchronously on a single thread.

Each state must complete fully before the next state can be run.

So - what's the optimal synchronisation technique for this scenario? Thanks!

Ibraheem
  • 2,168
  • 20
  • 27
  • Am I understanding this correctly - you can only run two tasks at a time, and in order to move to the next state you have to wait for both to finish and you may have up to 100,000 states? – pstrjds Apr 01 '18 at 15:06
  • Sounds like you need Monitor.Pulse and Monitor.Wait – Evk Apr 01 '18 at 15:06
  • So you do not want to use `ContinueWith`? – CodingYoshi Apr 01 '18 at 15:07
  • How about using parallel.foreach for each segment? What are you trying to achieve by constraining all jobs to be in the same phase at the same time. There might be a more natural way to do it with reactive extensions or the tpl. – gandaliter Apr 01 '18 at 15:09
  • @pstrjds that's correct. I'm actually using `Parallel.For` for the tasks to use as many CPU cores for the processing as possible. There are around 300 tasks created, and each need to run through up to 100,000 states. Obviously creating 300 * 100,000 tasks is a huge overhead that makes running the whole thing synchronously much faster. I'd like to create 300 Tasks and have these Tasks synchronise moving through the 100,000 states. Hope that makes it clearer. – Ibraheem Apr 01 '18 at 15:11
  • @CodingYoshi as I understand it, `ContinueWith` runs after a Task completes, and I would need to create a new Task for the next step. I'm trying to avoid creating a new Task for each step and instead have Tasks running through each step in sync. – Ibraheem Apr 01 '18 at 15:21
  • Why are you avoiding creating new tasks? Learning purposes or performance? If the latter, why not create tasks and then do some benchmarking, if slow then look for another option. It may be fast enough. – CodingYoshi Apr 01 '18 at 15:32
  • @CodingYoshi thanks, I've already benchmarked it, creating the 300 * 100,000 tasks by running each state through a `Parallel.For` takes longer than a synchronous implementation. Even though all CPU cores are used, the synchronous implementation using a single thread runs much faster. The overhead of creating 300 * 100,000 tasks must be costing more CPU cycles than is worth it. I'd like to create only 300 tasks and then keep those tasks running in sync with the state. Just not sure on how to do the synchronisation. @Evk mentioned `Monitor.Pulse` and `Monitor.Wait`? – Ibraheem Apr 01 '18 at 15:36
  • 1
    Can you switch to dataflow library? – Tanveer Badar Apr 01 '18 at 17:16
  • Maybe posting some pseudo code would help clarify what you are doing, but is it possible for you to have a producer thread that queues up all the work for the first stage and then blocks until the queue is empty (using a countdown event or something of that nature) and then when it is signaled queues up the next stage of work, etc. Then your worker threads just pull from the queue and when the queue is empty they block until the queue is full again. – pstrjds Apr 01 '18 at 17:24
  • `Parallel.For` doesn't create Tasks and there is no need for any other synchronization on it because it waits until all parallel activities are complete. You can process all of the items for one stage using Parallel.For. – Ian Mercer Apr 01 '18 at 23:08
  • How long does a single step (1/30/100,000 of the total) take to execute? If that is microseconds then, yes, you are wasting your time trying to parallelize it. – Ian Mercer Apr 01 '18 at 23:11
  • 1
    @IanMercer this is the problem - most take less than a ms to execute. Some take hundreds of ms. So my parallelising code is indeed a waste of time for the vast majority of executions, but having the CPU running at 5% during the longer executions is a waste of resources - I'm trying to find a solution that efficiently manages those <1ms executions, whilst also takes advantage of the full CPU resources for the ones that take 100s of ms. Which executions will take >1ms is unknown at design time. – Ibraheem Apr 02 '18 at 14:30
  • 1
    I think @IanMercer is correct, his solution is cleaner than mine. The Parallel.For should be the optimal solution and has the benefit of "auto tuning" itself based on the local machine it is running on, whereas using my solution you would have to figure out the optimal thread count for each machine you moved to. With all that said, without seeing the code that you are trying to parallelize, it really is just a guessing game at providing solutions. – pstrjds Apr 03 '18 at 07:59
  • @Ibraheem Can you know determine prior to executing a single item how long it will take (process quick ones synchronously and slow ones with `parallel.for`)? – Ian Mercer Apr 03 '18 at 15:39

2 Answers2

2

while (iterationState == state) await Task.Yield is indeed a terrible solution to synchronize across your 300 tasks (and no, 300 isn't necessarily super-expensive: you'll only get a reasonable number of threads allocated).

The key problem here isn't the Parallel.For, it's synchronizing across 300 tasks to wait efficiently until each of them have completed a given phase.

The simplest and cleanest solution here would probably be to have a for loop over the stages and a parallel.for over the bit you want parallelized:

for (int stage = 0; stage < 10000; stage++)
{
   // the next line blocks until all 300 have completed
   // will use thread pool threads as necessary
   Parallel.For( ... 300 items to process this stage ... );
}

No extra synchronization primitives needed, no spin-waiting consuming CPU, no needless thrashing between threads trying to see if they are ready to progress.

Ian Mercer
  • 38,490
  • 8
  • 97
  • 133
  • I tried this already and the performance was much slower than using a for loop within a for loop. My conclusion was the overhead of creating 300 * 100,000 tasks was too expensive compared to the performance gains of partitioning the processing across parallel threads. – Ibraheem Apr 01 '18 at 22:17
  • 1
    @Ibraheem Parallel.For doesn't "create tasks", see https://stackoverflow.com/questions/5009181/parallel-foreach-vs-task-factory-startnew – Ian Mercer Apr 01 '18 at 23:05
  • 1
    @Ibraheem and you tried it without any nasty "while ... Task.Yield" stuff? Maybe you should post the actual code you are trying to improve. – Ian Mercer Apr 01 '18 at 23:06
  • yes that's correct, it was the first thing I tried. I'm using `Parallel.For` to "parallelise" other areas of the project. My nasty synchronising code was my attempt to use long-running tasks instead of `Parallel.For`. – Ibraheem Apr 02 '18 at 14:27
1

I think I am understanding what you are trying to do, so here is a suggested way to handle it. Note - I am using Action as the type for the blocking collection, but you can change it to whatever would work best in your scenario.

// Shared variables
CountdownEvent workItemsCompleted = new CountdownEvent(300);
BlockingCollection<Action> workItems = new BlockingCollection<Action>();
CancellationTokenSource cancelSource = new CancellationTokenSource();

// Work Item Queue Thread
for(int i=1; i < stages; ++i)
{
    workItemsCompleted.Reset(300);
    for(int j=0; j < workItemsForStage[i].Count; ++j)
    {
         workItems.Add(() => {}) // Add your work item here
    }
    workItemsCompleted.Wait(token) // token should be passed in from cancelSource.Token
}

// Worker threads that are making use of the queue
// token should be passed to the threads from cancelSource.Token
while(!token.IsCancelled)
{
    var item = workItems.Take(token); // Blocks until available item or token is cancelled
    item();
    workItemsCompleted.Signal();
}

You can use cancelSource from your main thread to cancel the running operations if you need to. In your worker threads you would then need to handle the OperationCancelledException. With this setup you can launch as many worker threads as you need and easily benchmark where you are getting your optimal performance (maybe it is with only using 10 worker threads, etc). Just launch as many workers as you want and then queue up the work items in the Work item queue thread. It's basically a producer-consumer type model except that the producer queues up one phase of the work, then blocks until that phase is done and then queues up the next round of work.

pstrjds
  • 16,840
  • 6
  • 52
  • 61
  • I didn't think of using a BlockingCollection, I'll give that solution a go and see how that affects performance – Ibraheem Apr 01 '18 at 22:16