1

I have checked distinct performance over nested loop with any. But Distinct Method is much faster then nested loop.

var customers = new List<Customer>();

for (var i = 1; i <= 100000; i++)
{
    var id = (int)Math.Floor((decimal)i / 10);
    var customer = new Customer()
    {
        FirstName = $"Name {i}",
        ID = id,
        LastName = $"Last {i}"
    };

    customers.Add(customer);
}

System.Console.WriteLine($"Outer Loop start :{DateTime.UtcNow}");

var ids = new List<int>();

customers.ForEach(_=> {
    ids.Add(_.ID);
});

var uniqueIds = ids.Distinct();

System.Console.WriteLine($"Outer Loop End :{DateTime.UtcNow}");

System.Console.WriteLine($"Nested Loop start :{DateTime.UtcNow}");

var oids = new List<int>();

customers.ForEach(_ => {
    if (!oids.Any(i => i == _.ID))
    {
        oids.Add(_.ID);
    }
});
System.Console.WriteLine($"Nested Loop End :{DateTime.UtcNow}");

Result: Outer Loop start :6/20/2020 4:15:31 PM Outer Loop End :6/20/2020 4:15:31 PM Nested Loop start :6/20/2020 4:15:32 PM Nested Loop End :6/20/2020 4:15:46 PM

It just took 1 second for Outerloop but 14 seconds for nested loop. How Distinct is much faster than using "Any" Function in foreach ?

Guru Stron
  • 102,774
  • 10
  • 95
  • 132
Mahendra K
  • 19
  • 2
  • i am just trying to find out unique ID's in customer list. – Mahendra K Jun 20 '20 at 16:20
  • 2
    The looping is the same. Distinct I think uses a hash to get keys which is faster with a large list of items. So you are probably searching a list which performance is N/2 while distinct is using hash which is log2(N). – jdweng Jun 20 '20 at 16:24
  • 1
    Also, the `Stopwatch` class is more accurate for measuring execution times. See: https://stackoverflow.com/questions/2923283/stopwatch-vs-using-system-datetime-now-for-timing-events#:~:text=It%27s%20better%20to%20use%20the,Stopwatch%20s%20%3D%20Stopwatch. – Rufus L Jun 20 '20 at 16:27
  • FYI, there is no "inner loop" shown, only an `if` statement in one of the loops. – Rufus L Jun 20 '20 at 16:35
  • 1
    Yeah @RufusL, i am using Any method in that if statement, which i considered as a nested loop. – Mahendra K Jun 20 '20 at 17:21
  • Why not `var oids = new HashSet(customers.Select(c=>c.ID));` Beats the delegate and closures required for `ForEach`. You can plop a `ToList` at the end if you really need a `List` – pinkfloydx33 Jun 20 '20 at 17:37

1 Answers1

9

First of all it is faster cause Distinct actually does almost nothing - uniqueIds is not materialized IEnumerable<int> (you can check it adding .Select(c => {Console.WriteLine(c);return c;}) between ids and .Distinct() for example), change uniqueIds declaration line to :

var uniqueIds = ids.Distinct().ToList();

Secondary for proper benchmarking I would recommend using BenchmarkDotNet, for your case you can compose for example following benchmark (removed/reorganized some code cause it is not relevant to the actual benchmarked stuff):

public class GetDistinctIds
{
    private static readonly List<int> CustomerIds = Enumerable.Range(0, 100_000)
       .Select(i => (int) Math.Floor((decimal) i / 10))
       .ToList();

    [Benchmark]
    public List<int> Distinct() => CustomerIds.Distinct().ToList();

    [Benchmark]
    // just for fun =)
    // returning object so BenchmarkDotNet won't complain, actually non-materialized IEnumerable<int>
    public object DistinctNoToList() => CustomerIds.Distinct();

    [Benchmark]
    public List<int> HashSet() => new HashSet<int>(CustomerIds).ToList();

    [Benchmark]
    public List<int> NestedLoops()
    {
        var oids = new List<int>();

        CustomerIds.ForEach(id =>
        {
            if (!oids.Any(i => i == id))
            {
                oids.Add(id);
            }
        });
        return oids;
    }
}

Which gives on my machine next results:

|           Method |                Mean |             Error |            StdDev |
|----------------- |--------------------:|------------------:|------------------:|
|         Distinct |     1,842,519.98 ns |     16,088.362 ns |     17,882.171 ns |
| DistinctNoToList |            17.19 ns |          0.412 ns |          1.070 ns |
|          HashSet |     1,911,107.12 ns |     31,699.290 ns |     29,651.535 ns |
|      NestedLoops | 4,100,604,547.06 ns | 78,815,290.539 ns | 80,937,500.636 ns |

And finally to the "Why".

Distinct uses internally DistinctIterator which in it's turn uses internal Set class, described as A lightweight hash set, which, as I understand should be comparable in terms of search Big-O complexity with hashtable, resulting in constant search time in the best/average case, while List will have search (the !oids.Any) complexity of O(n).

Guru Stron
  • 102,774
  • 10
  • 95
  • 132