2

I have two list of objects one of which has a different number of items. Now, if I do a:

 var result = list1.Except(list2);

That would give me the difference between the items that are in list1 and not in list2, right? What I'd like to do, if possible, is to remove from list1 all those items in the same step. What I don't want to do is to have to loop through the "result" list and remove them from list1. Is that possible?

Thanks!

Luis Garcia
  • 1,311
  • 6
  • 19
  • 37

3 Answers3

7

You could use List.RemoveAll:

list1.RemoveAll(t => !list2.Contains(t));

This does not need to create a new collection and it also needs only one loop to remove all items that are in list1 but not in list2. However, i assume that you have misunderstood how LINQ works. Enumerable.Except is implemented using deferred execution. That means it will not be executed until the foreach. Except is also very efficient with large lists since it's using a set approach.

Tim Schmelter
  • 450,073
  • 74
  • 686
  • 939
6

Tim's solution is correct, but has a runtime of O(list1.Length * list2.Length). When you use a HashSet<T> with a proper hash you get close to O(O(list1.Length + list2.Length) runtime. This is much faster when list2 contains more than a handful of items.

The downside of my variant is that it needs to allocate the hashset and thus needs more memory.

var set2 = new HashSet<T>(list2);
list1.RemoveAll(item=>!set2.Contains(item));
CodesInChaos
  • 106,488
  • 23
  • 218
  • 262
0

An less efficient alternative to Tim's solution is a simple Where clause (kind of like a subquery in SQL). The benefit is you do not need to modify the original collection:

string[] list1 = new string[] { "a", "b", "c", "d", "e" };
string[] list2 = new string[] { "c", "d", "e", "f", "g" };

var result = list1.Where( i => list2.Contains( i ) ).ToArray();

If you have very large datasets, I would still recommend Tim's approach.

Result:

c
d
e

To use SQL terminology (to help with your googling), this is called an "inner join". In your case, this is a very simple problem since your input lists and result list are of the same type. If you wanted a more complex join between lists of different types, you may find this question helpful:

inner join in linq to entities

Community
  • 1
  • 1
JDB
  • 25,172
  • 5
  • 72
  • 123
  • This is not efficient. It has the same downside as my approach on large lists since it's complexity is `O(list1.Length * list2.Length)`. But it also has to create a new array(note that `ToList` is more efficient than `ToArray`, OP has used lists anyway). OP is also not using Linq-To-Sql but Linq-To-Objects as the `Except` suggest. – Tim Schmelter Sep 18 '13 at 15:25
  • @TimSchmelter - I've modified my post to address your concerns. I did not mean to suggest it was more efficient. Also, though the post was about LINQ to SQL, the syntax is equally valid for LINQ to Entities. I've found a different post which is more (superficially) on-topic – JDB Sep 18 '13 at 15:39