208

I am a bit of confused about Parallel.ForEach.
What is Parallel.ForEach and what does it exactly do?
Please don't reference any MSDN link.

Here's a simple example :

string[] lines = File.ReadAllLines(txtProxyListPath.Text);
List<string> list_lines = new List<string>(lines);

foreach (string line in list_lines)
{
    //My Stuff
}

How can I rewrite this example with Parallel.ForEach?

torial
  • 13,085
  • 9
  • 62
  • 89
SilverLight
  • 19,668
  • 65
  • 192
  • 300
  • This might have been answered here http://stackoverflow.com/questions/3789998/parallel-foreach-vs-foreachienumerablet-asparallel – Ujjwal Manandhar Sep 03 '12 at 17:20
  • 2
    @UjjwalManandhar That's actually quite different, as it's asking about the difference between the `Parallel` class and using PLINQ. – Reed Copsey Sep 03 '12 at 17:36
  • 19
    Others have answered how you can rewrite. So what does it do? It does an "action" on each item in the collection, just like a normal `foreach`. The difference is that the parallel version can do many "actions" at the same time. In most cases (depending on what computer is running the code, and how busy it is, and other stuff) it will be faster, and that's the most important advantage. Note that when you do it in parallel, you can not know in what *order* the items are processed. With a usual (serial) `foreach`, you are guaranteed that `lines[0]` comes first, then `lines[1]`, and so on. – Jeppe Stig Nielsen Sep 03 '12 at 17:53
  • 1
    @JeppeStigNielsen It will *not* always be faster as there is significant overhead with making things parallel. It depends on the size of the collection you are iterating on and the action within. The correct thing to do is to actually *measure* the difference between using Parallel.ForEach() and using foreach(). Many times a normal foreach() is faster. – Dave Black Mar 18 '16 at 16:51
  • 3
    @DaveBlack Sure. One will have to _measure_ whether it is faster or slower, in each case. I was just trying to describe parallelization in general. – Jeppe Stig Nielsen Mar 18 '16 at 18:11

6 Answers6

309

Foreach loop:

  • Iterations takes place sequentially, one by one
  • foreach loop is run from a single Thread.
  • foreach loop is defined in every framework of .NET
  • Execution of slow processes can be slower, as they're run serially
    • Process 2 can't start until 1 is done. Process 3 can't start until 2 & 1 are done...
  • Execution of quick processes can be faster, as there is no threading overhead

Parallel.ForEach:

  • Execution takes place in parallel way.
  • Parallel.ForEach uses multiple Threads.
  • Parallel.ForEach is defined in .Net 4.0 and above frameworks.
  • Execution of slow processes can be faster, as they can be run in parallel
    • Processes 1, 2, & 3 may run concurrently (see reused threads in example, below)
  • Execution of quick processes can be slower, because of additional threading overhead

The following example clearly demonstrates the difference between traditional foreach loop and

Parallel.ForEach() Example

using System;
using System.Diagnostics;
using System.Threading;
using System.Threading.Tasks;
namespace ParallelForEachExample
{
    class Program
    {
        static void Main()
        {
            string[] colors = {
                                  "1. Red",
                                  "2. Green",
                                  "3. Blue",
                                  "4. Yellow",
                                  "5. White",
                                  "6. Black",
                                  "7. Violet",
                                  "8. Brown",
                                  "9. Orange",
                                  "10. Pink"
                              };
            Console.WriteLine("Traditional foreach loop\n");
            //start the stopwatch for "for" loop
            var sw = Stopwatch.StartNew();
            foreach (string color in colors)
            {
                Console.WriteLine("{0}, Thread Id= {1}", color, Thread.CurrentThread.ManagedThreadId);
                Thread.Sleep(10);
            }
            Console.WriteLine("foreach loop execution time = {0} seconds\n", sw.Elapsed.TotalSeconds);
            Console.WriteLine("Using Parallel.ForEach");
            //start the stopwatch for "Parallel.ForEach"
             sw = Stopwatch.StartNew();
            Parallel.ForEach(colors, color =>
            {
                Console.WriteLine("{0}, Thread Id= {1}", color, Thread.CurrentThread.ManagedThreadId);
                Thread.Sleep(10);
            }
            );
            Console.WriteLine("Parallel.ForEach() execution time = {0} seconds", sw.Elapsed.TotalSeconds);
            Console.Read();
        }
    }
}

Output

Traditional foreach loop
1. Red, Thread Id= 10
2. Green, Thread Id= 10
3. Blue, Thread Id= 10
4. Yellow, Thread Id= 10
5. White, Thread Id= 10
6. Black, Thread Id= 10
7. Violet, Thread Id= 10
8. Brown, Thread Id= 10
9. Orange, Thread Id= 10
10. Pink, Thread Id= 10
foreach loop execution time = 0.1054376 seconds

Using Parallel.ForEach example

1. Red, Thread Id= 10
3. Blue, Thread Id= 11
4. Yellow, Thread Id= 11
2. Green, Thread Id= 10
5. White, Thread Id= 12
7. Violet, Thread Id= 14
9. Orange, Thread Id= 13
6. Black, Thread Id= 11
8. Brown, Thread Id= 10
10. Pink, Thread Id= 12
Parallel.ForEach() execution time = 0.055976 seconds
ruffin
  • 16,507
  • 9
  • 88
  • 138
Jignesh.Raj
  • 5,776
  • 4
  • 27
  • 56
  • 67
    I dont really agree with your 'claim' that Parallel.ForEach is (always) faster. This really depends on the heaviness on the operation inside the loop. This may or may not be worth the overhead of introducing paralellism. – Martao Jan 16 '14 at 09:23
  • @Martao in what cases isn't it faster? (Except when you have only one core/virtual core(thread) available) – Highmastdon Feb 23 '15 at 15:30
  • 1
    Well, the parallel for each means that separate threads are set up to execute the code in the loop body. Even though .NET does have efficient mechanism to do this, this is considerable overhead. So, if you just have to a simple operation (e.g. a sum or multiplication), the parallel foreach should not be faster. – Martao Feb 24 '15 at 08:59
  • 3
    @Jignesh this is not even good measurement example so I would not refer to this at all. Remove "Thread.Sleep(10);" from each loop body and try it again. – st35ly Jun 09 '15 at 01:57
  • 1
    @Martao is right, problem is with object locking overheads where parallel approach might be longer than sequential. – st35ly Jun 09 '15 at 01:59
  • 8
    @stenly I think the Sleep is precisely the reason why it is a _good_ example. You would not use a PFE with fast single iterations (as Martao explained) - so this answer is making the iteration slow, and the (correct) advantage of PFE is highlighted. I agree though that this needs to be explained in the answer, a bold "is always faster" is very misleading. – mafu Jan 22 '16 at 11:47
  • 1
    @Highmastdon - take a read at when Parallel can be slower. http://blogs.msdn.com/b/pfxteam/archive/2009/06/06/9703059.aspx – Dave Black Mar 18 '16 at 17:00
  • But he stated in his answer "Execution is faster (if single iterations are slow)". It has to be measured in edge cases, where the operation inside the loop is not that slow. – Legends Aug 26 '16 at 22:41
  • 1
    See [MSDN's *Do Not Assume That Parallel Is Always Faster*](https://docs.microsoft.com/en-us/dotnet/standard/parallel-programming/potential-pitfalls-in-data-and-task-parallelism#do-not-assume-that-parallel-is-always-faster) and [*Avoid Over-Parallelization*](https://docs.microsoft.com/en-us/dotnet/standard/parallel-programming/potential-pitfalls-in-data-and-task-parallelism#avoid-over-parallelization): "*The basic rule [is] parallel loops that have few iterations and fast user delegates are unlikely to speedup much.*" Also limited by # of proc cores & "costs of partitioning & synchronizing" – ruffin Sep 13 '17 at 14:04
  • @stenly, The Sleep(10) represents an API call that you may need to make for each line. – PRMan Jan 18 '22 at 17:44
  • Also depends on the number of cores in the processor. The more cores available, the more speedy the loop would be. By default, all the cores would be used by parallel, so the number of cores participating should be specified at the parallel call code. – Venugopal M Feb 27 '23 at 10:34
140
string[] lines = File.ReadAllLines(txtProxyListPath.Text);
List<string> list_lines = new List<string>(lines);
Parallel.ForEach(list_lines, line =>
{
    //Your stuff
});
Contango
  • 76,540
  • 58
  • 260
  • 305
L.B
  • 114,136
  • 19
  • 178
  • 224
  • 6
    Just wanted to point it out (more for the OP) so that there wasn't a misguided thought that it only works on `List` ;) – Reed Copsey Sep 03 '12 at 17:21
  • 1
    thanks for the attention and answer. i used List in my codes because of removing duplicate items using HASH lists. with regular array we can not remove duplicates easily:). – SilverLight Sep 03 '12 at 20:55
  • 125
    I am confused that this answer is marked as the right answer, since there is no explanation to the original posts question "What is Parallel.ForEach and what does it exactly do?"... – fose Nov 12 '15 at 11:33
  • 6
    @fosb The problem is the question title was edited to completely change the meaning... so this answer no longer makes any sense. Having said that, it's still a poor answer – aw04 Sep 09 '16 at 14:29
46
string[] lines = File.ReadAllLines(txtProxyListPath.Text);

// No need for the list
// List<string> list_lines = new List<string>(lines); 

Parallel.ForEach(lines, line =>
{
    //My Stuff
});

This will cause the lines to be parsed in parallel, within the loop. If you want a more detailed, less "reference oriented" introduction to the Parallel class, I wrote a series on the TPL which includes a section on Parallel.ForEach.

Reed Copsey
  • 554,122
  • 78
  • 1,158
  • 1,373
11

For big file use the following code (you are less memory hungry)

Parallel.ForEach(File.ReadLines(txtProxyListPath.Text), line => {
    //Your stuff
});
Samuel LEMAITRE
  • 1,041
  • 7
  • 8
5

These lines Worked for me.

string[] lines = File.ReadAllLines(txtProxyListPath.Text);
var options = new ParallelOptions { MaxDegreeOfParallelism = Environment.ProcessorCount * 10 };
Parallel.ForEach(lines , options, (item) =>
{
 //My Stuff
});
Prince Prasad
  • 1,528
  • 1
  • 16
  • 20
2

I would like to add about parallel options. If you don't mentioned it, by default all RAM will be utilize for this which may give you problem in production. So better to add max degree of parallelism too in code.

Parallel.ForEach(list_lines, new ParallelOptions { MaxDegreeOfParallelism = 2 }, line =>
{
    
});
Josef
  • 2,869
  • 2
  • 22
  • 23