I've read a lot of articles thay say "Parallel.Foreach blocks until it has finished all iterations".
That is not the case in my code - it starts to do some work, then executes code after a loop, and then resumes. Did I do something wrong?
TLDR on what the code does - I have a number of html pages I want to process. I read them from disk one at a time and parse them, then display some stats about the data I've processed. So the workload is mixed - it does both IO and computations.
var parser = new Parser();
var processedCounter = 0;
var errorCount = 0;
Parallel.ForEach(files, new ParallelOptions() { MaxDegreeOfParallelism = 4 }, async file =>
{
using (var reader = File.OpenText(file))
{
var fileText = await reader.ReadToEndAsync();
try
{
var result = parser.ParseHtml(fileText);
//bookkeeping goes here
Interlocked.Increment(ref processedCounter);
if ((processedCounter + 1) % 10 == 0)
{
log.InfoFormat("passed {0} of {1}", processedCounter + 1, files.Length);
}
}
catch (Exception ex)
{
Interlocked.Increment(ref errorCount);
if ((errorCount + 1) % 10 == 0)
{
log.InfoFormat("error_report - {0} errors of {1} total", errorCount + 1, files.Length);
}
}
}
});
DisplayStats();