0

I needed a quick and dirty method to count an array, but only select if the element contained a value. After a little bit of tinkering I came up with this:

int count = 0;
for (int i = 0; i < array.Length; i++)
{
    if (array[i] != null)
    count++;
}
return count;

Really, a quick method to overcome this issue with no useless details - Then after searching stackoverflow I noticed that this question was asked before: Count non null elements in an array of lists

A very elegant answer: Use Enumerable.Count()! The mighty gods of computer land gave us this precious tool called LINQ and I think we are using it more than we should.

array.Count(x => x != null)

Looks very sweet and easy to understand

To test its performance I created an 100 million element array containing "yes" string but only leaving 1 out. After running the test with Stopwatch class, the results were quite surprising:

Enumerable.Count: ~600 ms A dirty code block: ~60 ms

A very big difference when it comes to repetitive execution. What might be the reason behind this? I don't think that any engineer in Microsoft is as stupid as me.

fyb
  • 145
  • 13
  • 2
    It's not about being smart or not, it's about something specialized (your array implementation) vs something generalized (works on any `IEnumerble`). When you really care about performance in most cases LINQ is not the way to go (it's mostly for ease of use and generalized). That said, also keep in mind good performance tests are hard to do they should optimally involve warm up runs and then multiple runs, discarding outlyers etc (there are frameworks that do all this for you) –  Jul 19 '21 at 06:53

1 Answers1

3

What might be the reason behind this?

Your for loop is pretty easy for the JIT compiler to optimize really heavily. It's simple array access, a comparison and an increment. There's very little abstraction involved.

The LINQ approach involves three abstractions:

  • The array implementing IEnumerable<T>
  • Each iterator over the array implementing IEnumerator<T>
  • The predicate, expressed as a delegate

Given all of that, I'm not massively surprised at a 10x speed difference. And it won't stop me using Count in most cases... because I'd rarely expect that 10x speed difference to be significant in the broader context of an application. I'd expect generating/reading 100 million elements of data to take considerably longer than 600ms, so the Count part doesn't really need to be massively efficient. Of course, that's a generalized expectation - and for any given application where performance matters, it's worth measuring to see which parts are actually important.

If you happen to be writing an application where this does matter, then don't use LINQ for that part of the code - but don't throw it out in other places, where the benefit of readability outweighs the performance cost.

Jon Skeet
  • 1,421,763
  • 867
  • 9,128
  • 9,194