Parallel.ForEach performance

Question

I am using Parallel.ForEach to extract a bunch of zipped files and copy them to a shared folder on a different machine, where then a BULK INSERT process is started. This all works well but i have noticed that, as soon as some big files come along, no new tasks are started. I assume this is because some files take longer than others, that the TPL starts scaling down, and stops creating new Tasks. I have set the MaxDegreeOfParallelism to a reasonable number (8). When i look at the CPU activity, i can see, that most of the time the SQL Server machine is below 30%, even less when it sits on a single BULK INSERT task. I think it could do more work. Can i somehow force the TPL to create more simultanously processed Tasks?

Could it be all threads are waiting for access to the same drive? — Emond, Mar 14 '18 at 15:31
Accessing files, and updating a database are inherently IO bound, doing that in parallel just causes contention for shared resources. There many options to do this type of ETL. — JSteward, Mar 14 '18 at 15:34
Do you know that the work is indeed running on separate threads? i.e. what do you see when you print out the current thread ID? — p e p, Mar 14 '18 at 15:36
@ErnodeWeerd Good idea but i don't think that this is the case. It starts out processing thousands of smaller files, pretty fast, as expected. As soon as a single big file is processed, Task creation slows down. I could understand if there is a waiting situation, as in "8 tasks waiting for bulk insert to finish" or "8 tasks waiting for files to be copied", but that is not the case. It is a thousand files being finished in parallel with no problem, but than it is waiting for a single task, with no new tasks being created. — , Mar 14 '18 at 15:36
Rather than using parallel processing you might find it better to use asynchronous so threads will not block on IO and can be used for other tasks. — juharr, Mar 14 '18 at 15:50
@pep I have added the `Thread.CurrentThread.ManagedThreadId` per Task to our Console Output and they are all unique. — , Mar 14 '18 at 16:05
@juharr Every task is run on its own thread, so i assumed that it is already async? I know this is hard to diagnose without code or output, only by description of my problem. — , Mar 14 '18 at 16:07
@UrbanEsc If you're not using the async/await pattern then it's likely not async. The idea of async isn't about stuff running on separate threads it's about the threads not being blocked while IO occurs causing them to do nothing instead of other work. — juharr, Mar 14 '18 at 16:15
Curious - how do you know no new tasks are created once a single large file is encountered? Is this a console application? Could this apply: https://learn.microsoft.com/en-us/dotnet/standard/parallel-programming/potential-pitfalls-in-data-and-task-parallelism#avoid-executing-parallel-loops-on-the-ui-thread? — p e p, Mar 14 '18 at 16:16
@pep It is a console application, but the output of the console app has its own display queue (it is enqueued and processed on a different thread, with lower priority) — , Mar 14 '18 at 16:19
Instead of Parallel.ForEach, i have now employed code similar to this answer. https://stackoverflow.com/a/14075286/604613. — , Mar 14 '18 at 16:25
@UrbanESC You can up the MinThreadCount from in the ThreadPool — johnny 5, Mar 14 '18 at 19:45

score 3 · Accepted Answer · answered Mar 14 '18 at 17:28

The reason is most likely the way Parallel.ForEach processes items by default. If you use it on array or something that implements IList (so that total length and indexer is available) - it will split whole workload in batches. Then separate thread will process each batch. That means if batches has different "size" (by size I mean time to processes them) - "small" batches will complete faster.

For example, let's look at this code:

var delays = Enumerable.Repeat(100, 24).Concat(Enumerable.Repeat(2000, 4)).ToArray();
Parallel.ForEach(delays, new ParallelOptions() {MaxDegreeOfParallelism = 4}, d =>
{
    Thread.Sleep(d);
    Console.WriteLine("Done with " + d);
});

If you run it, you will see all "100" (fast) items are processed fast and in parallel. However, all "2000" (slow) items are processed in the end one by one, without any parallelizm at all. That's because all "slow" items are in the same batch. Workload was splitted in 4 batches (MaxDegreeOfParallelism = 4), and first 3 contain only fast items. They are completed fast. Last batch has all slow items and so thread dedicated to this batch will process them one by one.

You can "fix" that for your situation either by ensuring that items are distributed evenly (so that "slow" items are not all together in source collection), or for example with custom partitioner:

var delays = Enumerable.Repeat(100, 24).Concat(Enumerable.Repeat(2000, 4)).ToArray();
var partitioner = Partitioner.Create(delays, EnumerablePartitionerOptions.NoBuffering);
Parallel.ForEach(partitioner, new ParallelOptions {MaxDegreeOfParallelism = 4}, d =>
{
    Thread.Sleep(d);
    Console.WriteLine("Done with " + d);
});

NoBuffering ensures that items are taken one at a time, so avoids the problem.

Using another means to parallelize your work (such as SemaphoreSlim, or BlockingCollection) are also an option.

Parallel.ForEach performance

1 Answers1