2

i am working on an asp.net mvc-4 web application.. but i am not sure what are the differences between using these 2 approaches for iteration over a list and initiate WebClient() calls :-

Approach-1

Parallel.ForEach(photos,new ParallelOptions { MaxDegreeOfParallelism = 7 }, p =>
                            {
         ResourceAccountListInfo resourceAccountListInfo = new ResourceAccountListInfo();
         WebClient wc = new WebClient();               

         var json =  wc.DownloadString(p.url);
         resourceAccountListInfo = JsonConvert.DeserializeObject<ResourceAccountListInfo>(json);
         if (resourceAccountListInfo.operation.Details.CUSTOMFIELD.Count > 0)
                    {
                        List<CUSTOMFIELD> customfield = resourceAccountListInfo.operation.Details.CUSTOMFIELD.Where(a =>
                                 a.CUSTOMFIELDLABEL.ToLower() == "name"
                                ).ToList();
                        if (customfield.Count == 1)
                        {
                            PMresourcesOnly.Add(resourceAccountListInfo.operation.Details);

                        }

                    }
            //code goes here
                            });

Approach-2

   foreach (Photo p in photos)
                        {
            Task.Factory.StartNew(() =>
                        {
            ResourceAccountListInfo resourceAccountListInfo = new ResourceAccountListInfo();
            WebClient wc = new WebClient();

            var json =  wc.DownloadString(p.url);
            resourceAccountListInfo = JsonConvert.DeserializeObject<ResourceAccountListInfo>(json);
                     if (resourceAccountListInfo.operation.Details.CUSTOMFIELD.Count > 0)
                    {
                        List<CUSTOMFIELD> customfield = resourceAccountListInfo.operation.Details.CUSTOMFIELD.Where(a =>
                                 a.CUSTOMFIELDLABEL.ToLower() == "name"
                                ).ToList();
                        if (customfield.Count == 1)
                        {
                            PMresourcesOnly.Add(resourceAccountListInfo.operation.Details);

                        }

                    }
            //code goes here
                                });
                        }

thanks

John John
  • 1
  • 72
  • 238
  • 501
  • do all the threads all replace `resourceAccountListInfo`? that seems wrong – Scott Chamberlain Jun 29 '16 at 00:25
  • The differences are : Parallel.ForEach wait until all "photos" finished, then continue. In Task.Factory.StartNew, it always continue, whether photos finished or not. –  Jun 29 '16 at 00:25
  • @ScottChamberlain sorry i showed only part of the code which is related to my question.. now there are extra processing on the resourceAccountListInfo inside each iteration... – John John Jun 29 '16 at 00:29
  • They are still all operating on a shared variable which will introduce bugs – Scott Chamberlain Jun 29 '16 at 00:30
  • @ScottChamberlain apologies for not showing the full code,, it is not shared variable , it is a local variable inside each iteration ,, updated my code accordingly – John John Jun 29 '16 at 00:34
  • @ScottChamberlain once showed me about TPL DataFlow, and you should have a look at that too if you are considering these two above :) Makes UI apps like WPF a breeze – Alexandru Jun 29 '16 at 02:36
  • @Alexandru Actually I answered a earlier question of his about TPL Dataflow and I coped and pasted a large chunk of the answer from the answer I gave to your question :) – Scott Chamberlain Jun 29 '16 at 03:15
  • @Alexandru so do you suggest to use TPL over using Paralle.ForEach?? – John John Jun 29 '16 at 11:30
  • @johnG Absolutely, check out Scott's answer here: http://stackoverflow.com/questions/37973566/finalizer-for-parallel-foreach some benefits are: its awaitable, full cancellation token support, multi-threadability and adjustable level of parallelism, better handling of exceptions, no thread abort raising, works perfectly with async...basically, what the whole async and Task world should have had all along. When you use it and you call await it does not block your UI thread and you can have it come back to your UI thread upon finishing for more logical tasks to be done without using a dispatcher. – Alexandru Jun 29 '16 at 11:49
  • @Alexandru seems this is a confusing topic now some suggests using Paralle.Foreach other suggest using Task.WhenAll ,, so not sure which one to use and why ? – John John Jun 29 '16 at 14:21
  • Well, `Parallel.ForEach` is a more elegant way to structure such code than creating a new task for each code block. However, I would suggest you take a look at and instead use a third option, and that is to use TPL DataFlow with `new ActionBlock` where you replace `ObjectType` with `Photo`. You will need to add a NuGet package to your project in order to use TPL DataFlow code. – Alexandru Jun 29 '16 at 15:00
  • 1
    Ultimately, its up to you to choose the best solution for your code as you see fit; I guess its always hard to determine what the best fit is so you'll need to define it based on how different solutions work, so it will require some trial and error, and performance metrics, and having to check to see which one is the best I think :) I'm just trying to give my two cents though! – Alexandru Jun 29 '16 at 15:31
  • @Alexandru but i can not understand why saying that using parallel processing (such as Parallel.foreach) inside .net web application is a bad practice in general,, while using concurrency such as Task.WhenAll is a good practice ?? – John John Jun 29 '16 at 16:02
  • 1
    @johnG There are a lot of ways to make code do the same thing, and I don't think you can generalize and say that Parallel.ForEach is worse than Task.WhenAll or vice-versa *except* in certain situations...and even in those, they will probably have the same end result anyways so it does not matter much. It may matter when it comes to performance. You should check to see your CPU core usage in Task Manager when running either, to see if you are using all your CPU computability. – Alexandru Jun 29 '16 at 20:37

1 Answers1

3

Parallel.ForEach provides a convenience for iterating through a collection. Even though Task.Factory.StartNew can be called within a loop, there's nothing intrinsic to it that indicates that it will be used in a loop. You might just start one task.

Because Parallel.ForEach assumes a set of parallel operations, it provides you with ParallelOptions which allow you to specify the number of concurrent operations within the scope of that one loop. That's useful because in some cases we might not care how many concurrent operations there are, but in other cases we might. Take a look at this answer which describes using a Semaphore to limit the number of concurrent tasks. Setting MaxDegreeOfParallelism is much easier than using a Semaphore.

Also, Parallel.ForEach returns a ParallelLoopResult which gives you visibility into all of the operations in a loop. It would take a little more work to get that same visibility into a collection of tasks. You can also stop the loop with one method call rather than stopping multiple tasks.

Community
  • 1
  • 1
Scott Hannen
  • 27,588
  • 3
  • 45
  • 62
  • so you are with using Parallel.ForEach instead of using Tas.Factory.StartNew is this correct ? second question when using Parallel.Foreach in my case do i need to explicitly use lock to make sure there is not any data corruption or this is not required? – John John Jun 29 '16 at 11:29
  • I'd use `Parallel.ForEach` if I was dealing with a collection I wanted to process in parallel. Otherwise I might start a `Task`. You would need to use some sort of locking (or other ways of managing concurrency) if the parallel tasks are operating on something that requires it, like a shared property that they can't update simultaneously. But that locking would more likely be in the objects I'm working with - writing them so that they support concurrency - rather than in the loop itself. – Scott Hannen Jun 29 '16 at 12:06
  • what do you exactly mean by "Otherwise I might start a Task" ?? and when i need to use lock with Parallel.forEach or with Tasks ?can you provide more details please? – John John Jun 29 '16 at 14:22
  • That is a bit in-depth. I'd start with reading about when to use a lock. It applies to anything multithreaded. – Scott Hannen Jun 29 '16 at 14:30
  • so do i need to use a lock even if i chose to go with Tasks.whenAll ? – John John Jun 29 '16 at 16:03
  • Using a lock or not using a lock isn't about how you start parallel tasks. It's about what those tasks are doing - are they reading, writing, or performing other operations that only one thread at a time should do? If they are, then a `lock` can ensure that each thread waits its turn before executing certain code instead of multiple threads doing it at the same time. But usually the task or parallel loop isn't responsible for that. The lock would be (for example) on the property that gets updated, to ensure that two threads don't try to modify it simultaneously or write while another reads. – Scott Hannen Jun 29 '16 at 17:48