0

I've read a lot of articles thay say "Parallel.Foreach blocks until it has finished all iterations".
That is not the case in my code - it starts to do some work, then executes code after a loop, and then resumes. Did I do something wrong?
TLDR on what the code does - I have a number of html pages I want to process. I read them from disk one at a time and parse them, then display some stats about the data I've processed. So the workload is mixed - it does both IO and computations.

var parser = new Parser();

var processedCounter = 0;
var errorCount = 0;

Parallel.ForEach(files, new ParallelOptions() { MaxDegreeOfParallelism = 4 }, async file => 
{
    using (var reader = File.OpenText(file))
    {
        var fileText = await reader.ReadToEndAsync();

        try
        {
            var result = parser.ParseHtml(fileText);

            //bookkeeping goes here

            Interlocked.Increment(ref processedCounter);
            if ((processedCounter + 1) % 10 == 0)
            {
                log.InfoFormat("passed {0} of {1}", processedCounter + 1, files.Length);
            }
        }
        catch (Exception ex)
        {
            Interlocked.Increment(ref errorCount);
            if ((errorCount + 1) % 10 == 0)
            {
                log.InfoFormat("error_report - {0} errors of {1} total", errorCount + 1, files.Length);
            }
        }
    }
});

DisplayStats();
chester89
  • 8,328
  • 17
  • 68
  • 113
  • Are you doing `async` inside the Parallel.ForEach? – Scott Chamberlain Jul 08 '16 at 13:37
  • 1
    "The code is rather long (it's not an example code from some blog post), but I can post it if you want." A minimal reproduction case would be good. All the code would be bad. See: http://stackoverflow.com/help/mcve – Martijn Jul 08 '16 at 13:38
  • @ScottChamberlain I do, I read file text asynchronously – chester89 Jul 08 '16 at 13:38
  • Parallel.Foreach *blocks*, because it uses the original thread along with threadpool threads for processing. It's impossible to guess why your code doesn't finish without looking at it. – Panagiotis Kanavos Jul 08 '16 at 13:42
  • 2
    Parallel.ForEeach does not support using async methods. When you try to do it the complier makes your `async file => ...` in to a `async void` delegate. Use a newer class like one from [TPL Dataflow](https://msdn.microsoft.com/en-us/library/hh228603(v=vs.110).aspx) to do parallel async. – Scott Chamberlain Jul 08 '16 at 13:43
  • BTW Parallel.ForEacha and `async` don't work well together. If you wanted concurrent and asynchronous processing *with* input buffering as well, use an ActionBlock with a MaxDOP>1. Post the filenames to it and let its Action to read and process the file. ActionBlock does support async methods – Panagiotis Kanavos Jul 08 '16 at 13:44
  • 1
    @ScottChamberlain is Dataflow the hidden gem of .NET or what? It solves a myriad of problems but very few people know it – Panagiotis Kanavos Jul 08 '16 at 13:44
  • @PanagiotisKanavos I do know about TPL Dataflow, but haven't thought about it in this case. Parallel.Foreach seemed intuitive – chester89 Jul 08 '16 at 14:02
  • Parallel.ForEach is meant for parallel data processing, not concurrent long operations. – Panagiotis Kanavos Jul 08 '16 at 14:10
  • @PanagiotisKanavos in my case one operation takes less than a second. I don't consider it long – chester89 Jul 08 '16 at 14:15

0 Answers0