Tasks are .NET's low-level building blocks. .NET almost always has a better high-level abstraction for specific concurrency paradigms.
To paraphrase Rob Pike (slides) Concurrency is not parallelism is not asynchronous execution
. What you ask is concurrent execution, with a specific degree-of-parallelism. NET already offers high-level classes that can do that, without resorting to low-level task handling.
At the end, I explain why these distinctions matter and how they're implemented using different .NET classes or libraries
Dataflow blocks
At the highest level, the Dataflow classes allow creating a pipeline of processing blocks similar to a Powershell or Bash pipeline, where each block can use one or more tasks to process input. Dataflow blocks preserve message order, ensuring results are emitted in the order the input messages were received.
You'll often see combinations of block called meshes, not pipelines. Dataflow grew out of the Microsoft Robotics Framework and can be used to create a network of independent processing blocks. Most programmers just use to build a pipeline of steps though.
In your case, you could use a TransformBlock
to execute DoShopAndProcessResultAsync
and feed the output either to another processing block, or a BufferBlock you can read after processing all results. You could even split Shop and Process into separate blocks, each with its own logic and degree of parallelism
Eg.
var buffer=new BufferBlock<ShopResult>();
var blockOptions=new ExecutionDataflowBlockOptions {
MaxDegreeOfParallelism=3,
BoundedCapacity=1
};
var shop=new TransformBlock<Vendor,ShopResult)(DoShopAndProcessResultAsync,
blockOptions);
var linkOptions=new DataflowLinkOptions{ PropagateCompletion=true;}
shop.LinkTo(buffer,linkOptions);
foreach(var v in vendors)
{
await shop.SendAsync(v);
}
shop.Complete();
await shop.Completion;
buffer.TryReceiveAll(out IList<ShopResult> results);
You can use two separate blocks to shop and process :
var shop=new TransformBlock<Vendor,ShopResponse>(DoShopAsync,shopOptions);
var process=new TransformBlock<ShopResponse,ShopResult>(DoProcessAsync,processOptions);
shop.LinkTo(process,linkOptions);
process.LinkTo(results,linkOptions);
foreach(var v in vendors)
{
await shop.SendAsync(v);
}
shop.Complete();
await process.Completion;
In this case we await the completion of the last block in the chain before reading the results.
Instead of reading from a buffer block, we could use an ActionBlock at the end to do whatever we want to do with the results, eg store them to a database. The results can be batched using a BatchBlock to reduce the number of storage operations
...
var batch=new BatchBlock<ShopResult>(100);
var store=new ActionBlock<ShopResult[]>(DoStoreAsync);
shop.LinkTo(process,linkOptions);
process.LinkTo(batch,linkOptions);
batch.LinkTo(store,linkOptions);
...
shop.Complete();
await store.Completion;
Why do names matter
Tasks are the lowest level building blocks used to implement multiple paradigms. In other languages you'd see them described as Futures or Promises (eg Javascript)
Parallelism in .NET means executing CPU-bound computations over a lot of data using all available cores. Parallel.ForEach
will partition the input data into roughly as many partitions as there are cores and use one worker task per partition. PLINQ goes one step further, allowing the use of LINQ operators to specify the computation and let PLINQ to use algorithms optimized for parallel execution to map, filter, sort, group and collect results. That's why Parallel.ForEach
can't be used for async work at all.
Concurrency means executing multiple independent and often IO-bound jobs. At the lowest level you can use Tasks but Dataflow, Rx.NET, Channels, IAsyncEnumerable etc allow the use of high-level patterns like CSP/Pipelines, event stream processing etc
Asynchronous execution means you don't have to block while waiting for I/O-bound work to complete.