Use LINQ to remove items in List

Question

This question is very similar to this one: Use LINQ to get items in one List<>, that are not in another List<>. But the differences are just enough that I'm having trouble nailing down the LINQ syntax.

I have two lists:

List<Fubar> fewBarNew
List<string> existingProviderIDs

Where Fubar looks like:

Class Fubar
{
    int FunbarId int {get; set;}
    ....
    ....
    string ProviderID {get; set;}
}

Now, I want to remove from fewBarNew any instance where FewBarNew.ProviderID exists inside existingProviderIDs.

 fewBarNew = fewBarNew.Where(f => !existingProviderIdList.Any(ep => ?????).ToList();

@Rango and the difference is ... ? O(n^2) either way, full enumeration either way. — Panagiotis Kanavos, Mar 12 '19 at 15:57
@PanagiotisKanavos The difference is one is a LINQ method and one isn't. Thus there are no LINQ methods in that code. — Servy, Mar 12 '19 at 15:58

Gilad Green · Accepted Answer · 2019-03-12T16:03:02.473

6

Any enables you to check if any item in a collections matches some predicate. So you could define the predicate to be "if any item matches the current":

fewBarNew.Where(f => !existingProviderIdList.Any(ep => ep == f.ProviderID));

However, I think a cleaner way will be to use .Contains:

var result = fewBarNew.Where(f => !existingProviderIDs.Contains(f.ProviderID));

Then, as this performs in O(n^2) you can improve using a HashSet<string> instead:

var existingProviderIDSet = new HashSet<string>(existingProviderIDs);
var result = fewBarNew.Where(f => !existingProviderIDSet.Contains(f.ProviderID));

As HashSet's Contains performs an O(1) operation, this will execute in O(n).

edited Mar 12 '19 at 16:03

answered Mar 12 '19 at 15:55

Gilad Green

36,708
7
61
95

Thank you for the tip on performance as well, in addition to just the answer! – Casey Crookston Mar 12 '19 at 15:59
@GiladGreen I do the `.Where` thing all the time; but I've never done the HashSet. While I understand the `.Contains` is O(n), I wouldn't think this counters the cost of the creation of the HashSet (in time and resource consumption) where you already have the List. Am I misunderstanding? Or are you just suggesting that the HashSet would have been a better choice from the outset? – Clay Mar 12 '19 at 16:15
@Clay - it is very case dependent. I've have a list of ~200k items and a second which was also quite big. For that case it was very much worth it to create the hash-set and then use it. Best is to create a scenario where you populate two lists like this with different numbers of items and plot the execution time. Then you will see how much is the overhead of creating the hashset compared to the added value in performance – Gilad Green Mar 12 '19 at 16:19
@Clay `O(N)` is the cost for a *single* `List.Contains` call. When you perform a `Contains` for every item in the other list, the cost is `O(N^2)`. [HashSet.Contains](https://learn.microsoft.com/en-us/dotnet/api/system.collections.generic.hashset-1.contains?view=netframework-4.7.2) is O(1) though. At some point, the cost of repeated `List.Contains` calls will become larger than the initial cost of creating the HashSet – Panagiotis Kanavos Mar 12 '19 at 16:35

1 Answers1