1

I have an ArrayList ids containing String objects that are IDs, and another ArrayList objs containing objects which have a string ID field. Right now I have code, to find which ids don't have a match in objs, which looks like this:

var missing = new List<string>();

foreach (MyObj obj in objs)
{
    if (!ids.Contains(obj.ID))
    {
       missing.Add(obj.ID);
    }
}

This works fine. But I rewrote it to this an exercise to better "think in LINQ":

var missing = objs.Cast<MyObj>().Select(x => x.ID).Except(ids.Cast<string>());

I expected this LINQ to be slower than the foreach + Contains approach (especially due to the Cast calls), but the LINQ runs significantly faster. What is the LINQ approach doing differently that gives the performance benefit?

Community
  • 1
  • 1
jltrem
  • 12,124
  • 4
  • 40
  • 50
  • 3
    The linq approach is not executed, that's why it's faster. `Except` is using deferred execution. Apart from that it's using a set which is efficient. – Tim Schmelter Oct 22 '13 at 20:43
  • @TimSchmelter what do you mean by this? – jltrem Oct 22 '13 at 20:45
  • 1
    Include a `missing.ToList()` and measure again. – H H Oct 22 '13 at 20:45
  • `missing` is a query not a collection. Use `foreach` or `ToList` to execute the query. – Tim Schmelter Oct 22 '13 at 20:46
  • @TimSchmelter deferred by IEnumerable instead of making a List. Thanks, making sense now. – jltrem Oct 22 '13 at 20:57
  • 1
    @jltrem: A `List` is also an `IEnumerable`. Here are some informations about deferred execution: http://stackoverflow.com/questions/7324033/what-are-the-benefits-of-a-deferred-execution-in-linq – Tim Schmelter Oct 22 '13 at 21:00
  • @HenkHolterman now that `missing` is _doing_ something it takes some time. Still significantly faster than my `foreach` + `Contains` processing. Thanks for the direction. – jltrem Oct 22 '13 at 21:00

2 Answers2

5

LINQ Except uses HashSet internally, which has O(1) Contains method performance, when it's O(n) for ArrayList. That's why it's faster.

But as Tim pointed in his comment, your Except approach does not really produce any results. It just defines a query. The query is executed as soon as you need results. And it may be executed more than once. You should add ToList() call to get List<T> explicitly:

var missing = objs.Cast<MyObj>().Select(x => x.ID).Except(ids.Cast<string>()).ToList();

By the way, why are you using ArrayList instead of generic List<T>?

MarcinJuraszek
  • 124,003
  • 15
  • 196
  • 263
  • The `ArrayList` is provided somewhere else in the system that I'm working in. Not something I have control over right now. – jltrem Oct 22 '13 at 20:48
1

Except uses a HashSet<T> (or something similar) to efficiently find what object are the same, while your code uses the less-efficient List<T>.Contains (or similar) method.

Tim S.
  • 55,448
  • 7
  • 96
  • 122