5

This question is related to this one, but not entirely the same I think.

Given:

class Foo
{
  public string Bar { get; set; }
}
...
var c1 = new List<Foo>() { ... };
var c2 = new List<Foo>() { ... };

The following 2 loops give the same result:

  foreach (var item in c2.Where(f => c1.Any(f1 => f1.Bar.Equals(f.Bar))))
  { ... }

  foreach (var item in c2.Where(f => c1.Select(f1 => f1.Bar).Contains(f.Bar)))
  { ... }

Are they equally fast?

The difference with the other question, is whether the extra Select statement here, changes the importance of the nature of the underlying collection.

In other words: does this Contains:

foos.Contains(foo1)

act on the same "kind of collection" as this one:

foos.Select(f=>f.Bar).Contains(foo1.Bar)

My possible -naive- thought could be: "Once we're behind Linq's Select, everything is just 'Lists', so both the Any and Contains are O(n)."

Community
  • 1
  • 1
jan
  • 1,581
  • 2
  • 19
  • 34
  • 5
    Pretty darn close to your question: http://stackoverflow.com/questions/4445219/linq-ring-any-vs-contains-for-huge-collections – George Johnston Jun 25 '13 at 16:02
  • Actually you did answer yourself (and you can get confirmation with a breakpoint in debug... – Adriano Repetti Jun 25 '13 at 16:06
  • There is a slight difference between the two cases in that you are checking if the strings are equal and the other you are checking if one string contains the other string. – Kevin Jun 25 '13 at 16:09
  • You should be using a `Join` to do this. Both of these proposed solutions are inperformant as they both are doing a linear search on `f1` for each item in `c2`. A `Join` can avoid that and do it in O(n+m) time instead of O(n*m) time. – Servy Jun 25 '13 at 16:14
  • 2
    @Kevin No. In one case he's seeing if there is at least one item where the two strings are equal, in the second it's seeing if a sequence contains a particular item, where contains is saying that there is at least one item equal to the one provided. There is no functional difference between these two queries. – Servy Jun 25 '13 at 16:18
  • @Servy +1 you are correct of course. I read through it too quickly evidently. :) – Kevin Jun 25 '13 at 19:13

2 Answers2

19

Both the queries are fundamentally implementing the same algorithm. They will each iterate c1 for each item in c2, compare the Bar properties of the two objects, and return as soon as a match is found. The asymptotic complexity of the two cases is the same, meaning they'll both scale equally well (or equally bad, as the case happens to be) as the size of the two sets increase.

There may be a minor difference between the two in the overhead associated with one method over the other, but the difference will not be significant, and they will be smaller and smaller as the size of the collections increase. There isn't any real performance reason to pick one of the two over the other.

There is an option that you didn't show that is quite a bit faster than either of those. You can use a Join to find all of the items in c1 that also exist in c2 without doing a linear search through the sequence:

var query = from first in c1
    join second in c2
    on first.Bar equals second.Bar
    select first;

Another option would be to use a HashSet instead of a List, as that can be much more easily searched:

var set = new HashSet<string>(c1.Select(item => item.Bar));

var query = c2.Where(item => set.Contains(item.Bar));

(This solution is pretty close to what Join will do internally.)

Both of these solutions will be a lot faster than either of your proposed solutions.

Arad Alvand
  • 8,607
  • 10
  • 51
  • 71
Servy
  • 202,030
  • 26
  • 332
  • 449
-1

Your first approach will iterate and compare once and return the results.

The second query will be slower because it will iterate and extract the Bar property into a collection, and then iterate and compare against f.Bar to create the final result.

Rob Hardy
  • 1,821
  • 15
  • 15
  • 5
    This is wrong. Linq queries are evaluated lazily. It will only extract the `Bar` property as it needs to, and it won't create an intermediate collection. – p.s.w.g Jun 25 '13 at 16:11
  • @p.s.w.g Thank you, that answers the last point in my question. – jan Jun 26 '13 at 07:00