1
class Student
{
    public int ID { get; set; }
    public string Name { get; set; }
    public string Email { get; set; }
}

class Program
{
    private static object _lockObj = new object();

    static void Main(string[] args)
    {

        List<int> collection = Enumerable.Range(1, 100000).ToList();


         
        List<Student> students = new List<Student>(100000);
        var options = new ParallelOptions()
        {
            MaxDegreeOfParallelism = 1
        };
        var sp = System.Diagnostics.Stopwatch.StartNew();
        sp.Start();
        Parallel.ForEach(collection, options, action =>
        {
            lock (_lockObj)
            {
                 var dt = collection.FirstOrDefault(x => x == action);
                 if (dt > 0)
                 {
                    Student student = new Student();
                    student.ID = dt;
                    student.Name = "Zoyeb";
                    student.Email = "ShaikhZoyeb@Gmail.com";

                    students.Add(student);
                    Console.WriteLine(@"value of i = {0}, thread = {1}", 
                    action,Thread.CurrentThread.ManagedThreadId);
                }
            }
        });
        sp.Stop();

        double data = Convert.ToDouble(sp.ElapsedMilliseconds / 1000);
        Console.WriteLine(data);
         
    }
}

I want to loop through 100000 records as quickly as possible i tried foreach loop but it is not quite good for loop through 100000 records then after i tried to implement Parallel.ForEach() that improved my performance , in real scenario i will have collection of Ids and i need to lookup into collection whether id exits or not if exits then add. performance is hitting in condition when i comment condition it took around 3 seconds to execute and when i uncomment condition it took around 24 seconds so my question is there any way i can boost my performance by looking up id in collection

         //var dt = collection.FirstOrDefault(x => x == action);
         //if (dt > 0)
         //{
            Student student = new Student();
            student.ID = 1;
            student.Name = "Zoyeb";
            student.Email = "ShaikhZoyeb@Gmail.com";

            students.Add(student);
            Console.WriteLine(@"value of i = {0}, thread = {1}", 
            action,Thread.CurrentThread.ManagedThreadId);
        //}
Zoyeb Shaikh
  • 301
  • 1
  • 11
  • 3
    You have a lock inside parallel.foreach, which means each iteration will run sequentially, and not in parallel. Basically, you took parallel.foreach and converted it back into an overly complex normal foreach. You will have to restructure your code so that you can remove that lock. – Lasse V. Karlsen Jul 30 '20 at 10:59
  • can you show me an example of that @Lasse V. Karlsen ? – Zoyeb Shaikh Jul 30 '20 at 11:02
  • Unfortunately no, only on iPhone right now. You need to use collection types that can be manipulated and used by multiple threads at the same time. – Lasse V. Karlsen Jul 30 '20 at 11:07
  • Best way to handle performance issues is to first profile/measure, and then change the code. Don't start making changes unless you know where the bottleneck is. – Lasse V. Karlsen Jul 30 '20 at 11:11
  • yes i am totally agree with you @ Lasse V. Karlsen – Zoyeb Shaikh Jul 30 '20 at 11:15
  • This code is horrendous! You've got `MaxDegreeOfParallelism = 1`, you're doing `collection.FirstOrDefault(x => x == action)` over the `collection` (which is O(n^2)) and you're doing a `lock` inside a `Parallel.ForEach`. Surely your benchmarking with `Stopwatch` would have told you that this is bad. – Enigmativity Jul 30 '20 at 11:29
  • Oh, and you're doing a `Console.WriteLine` inside a loop that you're timing with a `Stopwatch`. That'll slow it down too. – Enigmativity Jul 30 '20 at 11:30
  • `action` is a value taken from a collection. There's no point in checking it out. An expression with `FirstOrDefault` is simply not necessary! – Alexander Petrov Jul 30 '20 at 11:32
  • this not actual scenario @Alexander Petrov this is why here it is not necessary!. in actual scenario i will have to lookup into collection – Zoyeb Shaikh Jul 30 '20 at 11:40
  • @ZoyebShaikh - We can only answer based on what you post. If the real-world problem is more complex then you should post a more complex question. – Enigmativity Jul 30 '20 at 11:44

2 Answers2

1

Your original code is doing a lock inside a Parallel.ForEach. That's essentially taking the parallel code and forcing it to run in series.

It takes 40 seconds on my machine.

It is really the equivalent of doing this:

    foreach (var action in collection)
    {
            var dt = collection.FirstOrDefault(x => x == action);
            if (dt > 0)
            {
                Student student = new Student();
                student.ID = dt;
                student.Name = "Zoyeb";
                student.Email = "ShaikhZoyeb@Gmail.com";

                students.Add(student);
            }
    }

Which also takes 40 seconds.

However, if you just do this:

    foreach (var action in collection)
    {
        Student student = new Student();
        student.ID = action;
        student.Name = "Zoyeb";
        student.Email = "ShaikhZoyeb@Gmail.com";

        students.Add(student);
    }

That takes 1 millisecond to run. It's roughly 40,000 times quicker.

In this case you can get much faster loops by iterating your collection once, not in a nested way and not using Parallel.ForEach.


My ap0ologies for missing that the bit about the id not existing.

Try this:

    HashSet<int> hashSet = new HashSet<int>(collection);

    List<Student> students = new List<Student>(100000);

    var sp = System.Diagnostics.Stopwatch.StartNew();
    sp.Start();
    foreach (var action in collection)
    {
        if (hashSet.Contains(action))
        {
            Student student = new Student();
            student.ID = action;
            student.Name = "Zoyeb";
            student.Email = "ShaikhZoyeb@Gmail.com";

            students.Add(student);
        }
    }
    sp.Stop();

That runs in 3 milliseconds.

An alternative is to use a join like this:

    foreach (var action in
        from c in collection
        join dt in collection on c equals dt
        select dt)
    {
        Student student = new Student();
        student.ID = action;
        student.Name = "Zoyeb";
        student.Email = "ShaikhZoyeb@Gmail.com";

        students.Add(student);
    }

That runs in 25 milliseconds.

Enigmativity
  • 113,464
  • 11
  • 89
  • 172
  • i know it would be much faster if i remove the FirstOrDefault() condition however we have selection helper in real case user will selection range of rows and process that's why i have to lookup in collection and i clearly mention in question that i know if FirstOrDefault() removed then it would work drastically fast – Zoyeb Shaikh Jul 30 '20 at 11:47
0

Problem 1

You are using a lock instead of a concurrent collection in a parallel foreach. Forcing the parallel foreach to wait, to access the lock and thus execution will be one-by-one.

Change your list to a ConcurrentBag, and remove the lock from the ParallelForEach

// using System.Collections.Concurrent; // at the top
var students = new ConcurrentBag<Student>()

Problem 2

FirstOrDefault() Isn't very performant if you want to select by Id. Use a Dictionary. Since the dictionary does a Hashmatch its much faster then FirstOrDefault. See this questoin.

Change your collection to be a dictionary:

var collection = Enumerable.Range(1, 100000)
    .ToDictionary(x=> x);

Change the access in your loop to:

if(collection.TryGetValue(action, out var dt))
{
  //....
}

Problem 3

Stopwatch is not a benchmarking tool. Please use Benchmark.Net or a different library.

Preben Huybrechts
  • 5,853
  • 2
  • 27
  • 63
  • This is a bad solution. Using `ConcurrentBag` is one further hack on top of an existing raft of hacks. – Enigmativity Jul 30 '20 at 11:32
  • it took me around 5 seconds to iterate , much obliged @Preben Huybrechts – Zoyeb Shaikh Jul 30 '20 at 11:33
  • what do you mean @Enigmativity i didn't get it? – Zoyeb Shaikh Jul 30 '20 at 11:34
  • @Enigmativity, please explain? As I've [read](https://devblogs.microsoft.com/pfxteam/faq-are-all-of-the-new-concurrent-collections-lock-free/) Concurrent bag is quite performant. – Preben Huybrechts Jul 30 '20 at 11:38
  • @ZoyebShaikh - Take a look at my answer. It runs in 1 **millisecond**. – Enigmativity Jul 30 '20 at 11:38
  • @PrebenHuybrechts - Yes, it is pretty good, but it still isn't as good as optimizing the loop so that it is `O(n)` rather than `O(n^2)`. – Enigmativity Jul 30 '20 at 11:38
  • @PrebenHuybrechts - I wish I could downvote twice. Calling `Enumerable.Range(1, 100000).ToDictionary(x=> x)` builds a completely unnecessary dictionary that maps each integer from 1 to 100000 to itself. It's inane. It's insanely quicker to go `var dt = action;`. – Enigmativity Jul 30 '20 at 11:42
  • @Enigmativity, I agree te Enumerable.Repeat is in the orignal. Maybe It's replaced to post on stackoverflow, with something that actual creates a List with X items. Like a database query. – Preben Huybrechts Jul 30 '20 at 11:43
  • @PrebenHuybrechts - In the original what? – Enigmativity Jul 30 '20 at 11:46
  • In the orignal question from OP. I'm guessing OP has something different, where he creates the Enumerable.Range() – Preben Huybrechts Jul 30 '20 at 11:48
  • @PrebenHuybrechts - so can we remove the lock completely? I'm assuming we also need to set the degree of parallelism to something other than 1, or having a ConcurrentBag serves no purpose? – Rich N Jul 30 '20 at 11:50
  • @RIchN, Yes the MaxDegreeOfParallelism should be increased, or use a normal for/foreach. The lock surves no purpose when using a concurrent collection. – Preben Huybrechts Jul 30 '20 at 11:52