Assume that I have a producer-consumer pattern like this. The post also explains why using TPL Dataflow might not be optimal in this specific case. That is why this question uses Tasks instead.
+----------+ +----------------+ +-----------+
| Task | | Task | | Task |
|Read files|-----BlockingCollection-->|Process values |----BlockingCollection---->|Write files|
+----------+ | |of data logger 1| | +-----------+
| +----------------+ |
| +----------------+ |
| | Task | |
|-BlockingCollection-->|Process values |--|
| |of data logger 2| |
| +----------------+ |
... (n Tasks) ...
In this implementation the reading and writing needs to happen concurrently to the processing, so each use a Task for that purpose. If I would use a blocking function for reading and writing, this would be the way to go, but how about an asynchronous read/write? Now I would like to know if I understood the use of async-await correctly, in this specific case. As I need the parallelism, the read and write should still occur in a separate Task. I would not like to waste CPU cycles on waiting for I/O results so async-await seems to be the solution.
Take this pseudo-implementation as an example. The ReadAllLinesAsync would be an implementation like this
BlockingCollection<string []> queue = new BlockingCollection<string []>(100);
var pathsToFiles = files;
await Task.Run(async () =>
{
//This is a dummy for the "Read files"-object's function call
for(int i=0; i<pathsToFiles.Length; i++)
{
string[] file = await ReadAllLinesAsync(pathsToFiles[i]);
queue.Add(file);
}
queue.CompleteAdding();
}
The first question is, does the example code use async-await properly?
The second question is, does this improve efficiency compared to the blocking I/O, by not blocking the CPU waiting for I/O?
I read the several articles and post on the async-await topic and have to say this is quite complicated with all the does and don'ts. Specifically I read the articles by Steven Cleary.