27

I have some code which I am currently optimizing for concurrency in multicore architectures. In one of my classes, I found a nested foreach loop. Basically the outer loop iterates through an array of NetworkInterface objects. The inner loop iterates though the network interfaces IP addresses.

It got me thinking, is having Nested Parallel.ForEach loops necessarily a good idea? After reading this article (Nested Parallel.ForEach Loops on the same list?) I am still unsure what applies where in terms of efficiency and parallel design. This example is taking about Parallel.Foreach statements being applied to a list where both loops are performing operations on that list.

In my example, the loops are doing different things, so, should I:

  1. Use nested Parallel.ForEach loops?
  2. Use Parallel.ForEach on the parent loop and leave the inner loop as-is?
Theodor Zoulias
  • 34,835
  • 7
  • 69
  • 104
Matthew Layton
  • 39,871
  • 52
  • 185
  • 313

3 Answers3

33

A Parallel.ForEach does not necessarily execute in parallel -- it is just a request to do so if possible. Therefore, if the execution environment does not have the CPU power to execute the loops in parallel, it will not do so.

If the actions on the loops are not related (i.e., if they are separate and do not influence each other), I see no problem using Parallel.ForEach both on inner and outer loops.

It really depends on the execution environment. You could do timing tests if your test environment is similar enough to the production environment, and then determine what to do. When in doubt, test ;-)

Good luck!

Roy Dictus
  • 32,551
  • 8
  • 60
  • 76
  • 3
    Couldn't disagree more. Yes the scheduler behind Parallel.Foreach may not spawn off separate threads, but you're jumping through a lot more overhead of either threads, or a scheduler without having any scientific data to back it up. – M Afifi Sep 27 '12 at 09:48
  • 3
    @MAfifi: Read my answer again, please. – Roy Dictus Sep 27 '12 at 09:49
  • 1
    Parallelizing the inner loop will add some overhead. So, you most likely will get worse performance by parallelizing the inner loop too. – svick Sep 27 '12 at 10:03
  • Thanks. In my case it is impossible to know which environments will be used as my code is part of an API, so it may be used on many different architectures. – Matthew Layton Sep 27 '12 at 10:14
  • 2
    @activwerx: In that case I would just optimize for what you consider likely the most common setup. And that would mean taking an "average" server PC, testing and measuring. Probably nested parallel loops would indeed cause overhead in this case. It would be different if you knew in advance that your code would run on massive hardware, then the parallelism would surely pay off. – Roy Dictus Sep 27 '12 at 12:01
  • @RoyDictus I have an idea...is it possible to dynamically optimize the code according to the machine architecture? i.e. if the machine has for example 2 or more cores, then run tasks concurrently (Parallel.For(...)). if it has less than 2 cores then use a standard loop (for(x, y, z)) ? Could this work? – Matthew Layton Sep 27 '12 at 15:00
  • 1
    @activwerx: I suppose that in theory this is possible -- you may request hardware info and act accordingly -- but in practice you don't know whether you have access to those cores, and even then you would not be sure that parallel.foreach forces an optimization. Only on dedicated hardware can you be sure of this, or when you know in advance that the administrators will assign n cores to your process... – Roy Dictus Oct 01 '12 at 09:32
3

The answer will be, it depends;

  1. What are you doing with the IP address once you have it?
  2. How long does each step take?

Threads are not cheap, they take time to create, and memory to exist. If you're not doing something computationally expensive with those IP Addresses, and using the wrong type of collection for concurrent access, you're almost certainly slowing down your application.

Use StopWatch to help you answer these questions.

M Afifi
  • 4,645
  • 2
  • 28
  • 48
  • 5
    Threads are expensive to create, which is exactly why `Parallel.ForEach()` uses the `ThreadPool`, so creating new threads most likely won't be a problem. – svick Sep 27 '12 at 10:05
0

My advice is to follow the second approach: Parallelize only the outer loop, and keep the inner loops sequential (for/foreach). Don't place Parallel.ForEach loops the one inside the other. The reasons are:

  1. The parallelization adds overhead. Each Parallel loop has to synchronize the enumeration of the source, start Tasks, watch cancellation/termination flags etc. By nesting Parallel loops you are paying this cost multiple times.

  2. Limiting the degree of parallelism becomes harder. The MaxDegreeOfParallelism option is not an ambient property that affects child loops. It limits only a single loop. So if you have an outer Parallel loop with MaxDegreeOfParallelism = 4 and an inner Parallel loop also with MaxDegreeOfParallelism = 4, the inner body might be invoked concurrently 16 times (4 * 4). It is still possible to enforce a sensible upper limit by configuring all loops with the same TaskScheduler, and specifically with the ConcurrentScheduler property of a shared ConcurrentExclusiveSchedulerPair instance.

  3. In case of an exception you'll get a deeply nested AggregateException, that you'll have to Flatten.

I would also suggest considering a third approach: do a single Parallel loop on a flattened source sequence. For example instead of:

ParallelOptions options = new() { MaxDegreeOfParallelism = X };

Parallel.ForEach(NetworkInterface.GetAllNetworkInterfaces(), options, ni =>
{
    foreach (UnicastIPAddressInformation ip in ni.GetIPProperties().UnicastAddresses)
    {           
        // Do stuff with ni and ip
    });
});

...you could do this:

var query = NetworkInterface.GetAllNetworkInterfaces()
    .SelectMany(ni => ni.GetIPProperties().UnicastAddresses, (ni, ip) => (ni, ip));

Parallel.ForEach(query, options, pair =>
{
    (ni, ip) = pair;
    // Do stuff with ni and ip
});

This approach parallelizes only the Do stuff. The calling of ni.GetIPProperties() is not parallelized. The IP addresses are fetched sequentially, for one NetworkInterface at a time. It also intensifies the parallelization of each NetworkInterface, which might not be what you want (you might want to spread the parallelization among many NetworkInterfaces). So this approach has characteristics that make it compelling for some scenarios, and unsuitable for others.

One other case worth mentioning is when the objects in the outer and inner sequences are of the same type, and have a parent-child relationship. In that case check out this question: Parallel tree traversal in C#.

Theodor Zoulias
  • 34,835
  • 7
  • 69
  • 104