4

I created a list of 1,000,000 integers (range:[0,99]), and random select one target value, and count from the list using two method:

  1. lst.Count(v => v == target);
  2. lst.Where(v => v == target).Count();

I suppose the second way is doing more work than the first one, but benchmark shows that using where before count is 2~3 times faster than Count(contidion), why?

Dotnet 5.0 is used, by the way.


I am using BenchmarkDotNet, and I changed the list length to 100,000,000, the resut is the same. I also tried this.numbers.Where(v => v == target).Count(v => v == target), it's still faster than the first one.

This is the code I used:

[MemoryDiagnoser]
public class TestLinq
{
    private readonly List<int> numbers;
    private readonly Random rand = new Random();

    public TestLinq()
    {
        this.numbers = new List<int>();
        this.Prepare();
    }

    private void Prepare()
    {
        for (var i = 0; i < 100000000; i++)
        {
            this.numbers.Add(rand.Next(0, 99));
        }
    }

    [Benchmark]
    public void CountDirectly()
    {
        var target = rand.Next(0, 99);
        var result = this.numbers.Count(v => v == target);
    }

    [Benchmark]
    public void WhereThenCount()
    {
        var target = rand.Next(0, 99);
        var result = this.numbers.Where(v => v == target).Count();
    }
}

the result is

Method Mean Error StdDev Median Gen 0 Gen 1 Gen 2 Allocated
CountDirectly 735.0 ms 13.47 ms 28.42 ms 720.6 ms - - - 464 B
WhereThenCount 212.9 ms 2.59 ms 2.16 ms 213.3 ms - - - 480 B

and if change to this.numbers.Where(v => v == target).Count(v => v == target), it's still faster than the fist one

Method Mean Error StdDev Gen 0 Gen 1 Gen 2 Allocated
CountDirectly 723.2 ms 14.07 ms 14.45 ms - - - 464 B
WhereThenCount 419.2 ms 4.34 ms 4.06 ms - - - 560 B
Brian Holsen
  • 350
  • 3
  • 10
  • The second version *is* doing more work, iterating twice. Once to filter values, and once to count the results. 1M integers isn't a lot of data and simple comparisons are essentially 1 CPU operation. Any benchmark is going to measure the iteration overhead, not the comparison. – Panagiotis Kanavos Mar 22 '21 at 09:23
  • 2
    If you want to know what's really going on though, you'll have to post your benchmark code. Unless you use BenchmarkDotNet, you may be measuring your own test code, not the LINQ operations. 1M integers is very little data and may not even fill the CPU's cache. A brute-force search like this only needs a few milliseconds. You may be experiencing caching and warmup effects instead of measuring the real performance. BenchmarkDotNet will repeat each option enough times to eliminate warmup effects and the results are consistent with each other – Panagiotis Kanavos Mar 22 '21 at 09:26
  • 3
    There is just one iteration of the collection in both cases. It's most likely due to `Where` using a better iterator for List than Count. See this question: https://stackoverflow.com/questions/25787555/linq-performance-count-vs-where-and-count – MAV Mar 22 '21 at 09:33
  • This topic covered in this video https://youtu.be/8-NAwKYXMzs?t=187 . Basically .Count() method has some optimizations based on the collection type where you called it – dantey89 Mar 22 '21 at 17:07

0 Answers0