712

I would assume there's a simple LINQ query to do this, I'm just not exactly sure how.

Given this piece of code:

class Program
{
    static void Main(string[] args)
    {
        List<Person> peopleList1 = new List<Person>();
        peopleList1.Add(new Person() { ID = 1 });
        peopleList1.Add(new Person() { ID = 2 });
        peopleList1.Add(new Person() { ID = 3 });

        List<Person> peopleList2 = new List<Person>();
        peopleList2.Add(new Person() { ID = 1 });
        peopleList2.Add(new Person() { ID = 2 });
        peopleList2.Add(new Person() { ID = 3 });
        peopleList2.Add(new Person() { ID = 4 });
        peopleList2.Add(new Person() { ID = 5 });
    }
}

class Person
{
    public int ID { get; set; }
}

I would like to perform a LINQ query to give me all of the people in peopleList2 that are not in peopleList1.

This example should give me two people (ID = 4 & ID = 5)

Massimiliano Kraus
  • 3,638
  • 5
  • 27
  • 47
JSprang
  • 12,481
  • 7
  • 30
  • 32
  • 3
    Perhaps it's a good idea to make ID readonly since the identity of an object shouldn't change over its live time. Unless of course your testing- or ORM-framework requires it to be mutable. – CodesInChaos Oct 15 '10 at 18:18
  • 3
    Could we call this a "Left (or Right) Excluding Join" according to [this diagram?](https://www.codeproject.com/Articles/33052/Visual-Representation-of-SQL-Joins) – Nate Anderson Jun 05 '17 at 16:53

11 Answers11

1213

This can be addressed using the following LINQ expression:

var result = peopleList2.Where(p => !peopleList1.Any(p2 => p2.ID == p.ID));

An alternate way of expressing this via LINQ, which some developers find more readable:

var result = peopleList2.Where(p => peopleList1.All(p2 => p2.ID != p.ID));

Warning: As noted in the comments, these approaches mandate an O(n*m) operation. That may be fine, but could introduce performance issues, and especially if the data set is quite large. If this doesn't satisfy your performance requirements, you may need to evaluate other options. Since the stated requirement is for a solution in LINQ, however, those options aren't explored here. As always, evaluate any approach against the performance requirements your project might have.

Jeremy Caney
  • 7,102
  • 69
  • 48
  • 77
Klaus Byskov Pedersen
  • 117,245
  • 29
  • 183
  • 222
  • 46
    You are aware that that's a O(n*m) solution to a problem that can easily be solved in O(n+m) time? – Niki Oct 15 '10 at 18:21
  • Yeah, it wouldn't let me mark it as the answer right away, said I needed to wait 5 minutes :) Thanks again! – JSprang Oct 15 '10 at 18:22
  • 42
    @nikie, the OP asked for a solution that uses Linq. Maybe he's trying to learn Linq. If the question had been for the most efficient way, my question would not necessarily have been the same. – Klaus Byskov Pedersen Oct 15 '10 at 18:27
  • 69
    @nikie, care to share your easy solution? – Rubio Sep 02 '14 at 06:25
  • 21
    This is equivalent and I find easier to follow: var result = peopleList2.Where(p => peopleList1.All(p2 => p2.ID != p.ID)); – AntonK Jun 21 '16 at 01:30
  • 3
    @KlausByskovPedersen while your answer is spot on for the given question, please note that Google brings people like me who search for the same question without the "Use Linq" bit. I was already aware of where but was looking to see if there was a better solution. So would you consider adding the Except bit to your answer as well? – Menol Nov 02 '16 at 10:58
  • You could also use .All(), making it slightly clearer, var result = peopleList2.Where(p => peopleList1.All(p2 => p2.ID != p.ID)); – Ernest Jul 06 '17 at 21:18
  • 2
    if in fact you use Resharper it will suggest you change the 'Any' to 'All' as mentioned above – EzaBlade Jul 20 '17 at 12:06
  • 48
    @Menol - it might be a bit unfair to criticize someone who correctly responds to a question. People shouldn't need to anticipate all the ways and contexts that future people might stumble onto the answer. In reality, you should direct that to nikie - who took the time to state that they knew of an alternative without providing it. – Chris Rogers Sep 11 '17 at 23:46
  • 6
    @ChrisRogers I agree with your point. Just to be clear, I didn't mean to criticize Klaus, I was merely pointing him to the problem I faced hoping he would improve his ansewer. I do apologize if I didn't do a good job expressing it correctly. – Menol Sep 20 '17 at 08:15
  • 1
    FYI: I’ve updated the answer with notes from these comments. Notably, this includes the alternate formulation from @AntonK. I _also_ included @Niki’s warning—though the alternate formulation is potentially faster, as it will stop if it finds a match. (Obviously, though, it’s only faster if there is actually overlap between the two sets.) – Jeremy Caney Jan 13 '20 at 08:16
  • 7
    @Niki, so what is the O(m+n) approach@_@? – Gen.L May 26 '20 at 17:54
  • 4
    @Gen.L He is probably referring to implementing IEquatable on the object, which uses hash tables, and then using LINQ's .Except. This isn't always possible or practical; if the lists to be compared will always be small, might as well use .Where/.All instead of .Except. If it takes 0.2 seconds to iterate through the lists with .Where/.All and 0.07 seconds with .Except am I really going to spend 10 minutes implementing IEquatable, complicating the code base, and explaining IEquatable to the other devs on my team who aren't familiar with it? Probably not. – jspinella Sep 29 '20 at 17:57
  • See CodesInChaos's [answer](https://stackoverflow.com/a/3944821/491651) for the more performant, non-LINQ version. – Mike Pennington Mar 21 '22 at 05:19
533

If you override the equality of People then you can also use:

peopleList2.Except(peopleList1)

Except should be significantly faster than the Where(...Any) variant since it can put the second list into a hashtable. Where(...Any) has a runtime of O(peopleList1.Count * peopleList2.Count) whereas variants based on HashSet<T> (almost) have a runtime of O(peopleList1.Count + peopleList2.Count).

Except implicitly removes duplicates. That shouldn't affect your case, but might be an issue for similar cases.

Or if you want fast code but don't want to override the equality:

var excludedIDs = new HashSet<int>(peopleList1.Select(p => p.ID));
var result = peopleList2.Where(p => !excludedIDs.Contains(p.ID));

This variant does not remove duplicates.

CodesInChaos
  • 106,488
  • 23
  • 218
  • 262
  • 1
    That would only work if `Equals` had be overridden to compare ID's. – Klaus Byskov Pedersen Oct 15 '10 at 18:05
  • 51
    That's why I wrote that you need to override the equality. But I've added an example which works even without that. – CodesInChaos Oct 15 '10 at 18:10
  • 4
    It would also work if Person was a struct. As it is though, Person seems an incomplete class as it has a property called "ID" which does not identify it - if it did identify it, then equals would be overridden so that equal ID meant equal Person. Once that bug in Person is fixed, this approach is then better (unless the bug is fixed by renaming "ID" to something else that doesn't mislead by seeming to be an identifier). – Jon Hanna Oct 15 '10 at 18:12
  • 6
    It also works great if you're talking about a list of strings (or other base objects), which was what I was searching for when I came upon this thread. – Dan Korn Oct 04 '17 at 19:26
  • @DanKorn Same, this a simpler solution, compared to the where, for basic comparison, int, objects ref, strings. – Maze May 10 '18 at 08:28
  • It seems you also need to override `GetHashCode()`. Which makes sense now, since you mentioned a hashtable. But implementing only `Equals()` returns incorrect results. – Ian Oct 13 '20 at 01:40
  • @Ian Actually you have to [override both Equals() and GetHashCode()](https://learn.microsoft.com/en-us/dotnet/api/system.linq.enumerable.except?view=net-5.0#System_Linq_Enumerable_Except__1_System_Collections_Generic_IEnumerable___0__System_Collections_Generic_IEnumerable___0__System_Collections_Generic_IEqualityComparer___0__), no matter what – leguminator Feb 01 '21 at 16:12
  • I tried the Where(...Any) solution on a dataset with ~200,000 records and my query timed out, every time. That's probably correct, according to the answer, but for those doing this in a live setting I would recommend using the Except() solution here. – E Benzle Mar 16 '21 at 18:44
  • I used your second code example with the hashset, it worked very well for me for the scenario where I am joining two different classes. – ScoMo May 24 '22 at 14:51
82

Or if you want it without negation:

var result = peopleList2.Where(p => peopleList1.All(p2 => p2.ID != p.ID));

Basically it says get all from peopleList2 where all ids in peopleList1 are different from id in peoplesList2.

Just a little bit different approach from the accepted answer :)

Sum None
  • 2,164
  • 3
  • 27
  • 32
user1271080
  • 846
  • 7
  • 6
  • 6
    This method (list of over 50,000 items) was significantly faster than the ANY method! – DaveN Jul 23 '17 at 08:59
  • 12
    This might be faster just because it is lazy. Note that this is not doing any real work just yet. It's not until you enumerate the list that it actually does the work (by calling ToList or using it as part of a foreach loop, etc.) – Xtros Jun 11 '18 at 06:23
37

Since all of the solutions to date used fluent syntax, here is a solution in query expression syntax, for those interested:

var peopleDifference = 
  from person2 in peopleList2
  where !(
      from person1 in peopleList1 
      select person1.ID
    ).Contains(person2.ID)
  select person2;

I think it is different enough from the answers given to be of interest to some, even thought it most likely would be suboptimal for Lists. Now for tables with indexed IDs, this would definitely be the way to go.

Michael Goldshteyn
  • 71,784
  • 24
  • 131
  • 181
19

Bit late to the party but a good solution which is also Linq to SQL compatible is:

List<string> list1 = new List<string>() { "1", "2", "3" };
List<string> list2 = new List<string>() { "2", "4" };

List<string> inList1ButNotList2 = (from o in list1
                                   join p in list2 on o equals p into t
                                   from od in t.DefaultIfEmpty()
                                   where od == null
                                   select o).ToList<string>();

List<string> inList2ButNotList1 = (from o in list2
                                   join p in list1 on o equals p into t
                                   from od in t.DefaultIfEmpty()
                                   where od == null
                                   select o).ToList<string>();

List<string> inBoth = (from o in list1
                       join p in list2 on o equals p into t
                       from od in t.DefaultIfEmpty()
                       where od != null
                       select od).ToList<string>();

Kudos to http://www.dotnet-tricks.com/Tutorial/linq/UXPF181012-SQL-Joins-with-C

Dovydas Šopa
  • 2,282
  • 8
  • 26
  • 34
Richard Ockerby
  • 465
  • 5
  • 14
16

This Enumerable Extension allow you to define a list of item to exclude and a function to use to find key to use to perform comparison.

public static class EnumerableExtensions
{
    public static IEnumerable<TSource> Exclude<TSource, TKey>(this IEnumerable<TSource> source,
    IEnumerable<TSource> exclude, Func<TSource, TKey> keySelector)
    {
       var excludedSet = new HashSet<TKey>(exclude.Select(keySelector));
       return source.Where(item => !excludedSet.Contains(keySelector(item)));
    }
}

You can use it this way

list1.Exclude(list2, i => i.ID);
Bertrand
  • 601
  • 1
  • 7
  • 10
  • By having the code that @BrianT has, how could I convert it to use your code? – Nicke Manarin Jul 24 '19 at 13:23
  • Create a new class somewhere with the EnumerableExtensions code in Bertrand's reply. Add using statement in class where query is performed. Then change the selection code to `var result = peopleList2.Exclude(peopleList1, i => i.ID);` – Shane Knowles Dec 01 '21 at 14:27
14

Klaus' answer was great, but ReSharper will ask you to "Simplify LINQ expression":

var result = peopleList2.Where(p => peopleList1.All(p2 => p2.ID != p.ID));

Brian T
  • 149
  • 1
  • 5
  • It's worth to note that this trick won't work if there's more than one property binding the two objects (think SQL composite key). – Alrekr Oct 02 '18 at 20:18
  • Alrekr - If what you mean to say is "you will need to compare more properties if more properties need comparing" then I'd say that's pretty obvious. – Lucas May 16 '20 at 17:22
2

Once you write a generic FuncEqualityComparer you can use it everywhere.

peopleList2.Except(peopleList1, new FuncEqualityComparer<Person>((p, q) => p.ID == q.ID));

public class FuncEqualityComparer<T> : IEqualityComparer<T>
{
    private readonly Func<T, T, bool> comparer;
    private readonly Func<T, int> hash;

    public FuncEqualityComparer(Func<T, T, bool> comparer)
    {
        this.comparer = comparer;
        if (typeof(T).GetMethod(nameof(object.GetHashCode)).DeclaringType == typeof(object))
            hash = (_) => 0;
        else
            hash = t => t.GetHashCode(); 
    }

    public bool Equals(T x, T y) => comparer(x, y);
    public int GetHashCode(T obj) => hash(obj);
}
Wouter
  • 2,540
  • 19
  • 31
1

first, extract ids from the collection where condition

List<int> indexes_Yes = this.Contenido.Where(x => x.key == 'TEST').Select(x => x.Id).ToList();

second, use "compare" estament to select ids diffent to the selection

List<int> indexes_No = this.Contenido.Where(x => !indexes_Yes.Contains(x.Id)).Select(x => x.Id).ToList();

Obviously you can use x.key != "TEST", but is only a example

Ángel Ibáñez
  • 329
  • 1
  • 6
0

Here is a working example that get IT skills that a job candidate does not already have.

//Get a list of skills from the Skill table
IEnumerable<Skill> skillenum = skillrepository.Skill;
//Get a list of skills the candidate has                   
IEnumerable<CandSkill> candskillenum = candskillrepository.CandSkill
       .Where(p => p.Candidate_ID == Candidate_ID);             
//Using the enum lists with LINQ filter out the skills not in the candidate skill list
IEnumerable<Skill> skillenumresult = skillenum.Where(p => !candskillenum.Any(p2 => p2.Skill_ID == p.Skill_ID));
//Assign the selectable list to a viewBag
ViewBag.SelSkills = new SelectList(skillenumresult, "Skill_ID", "Skill_Name", 1);
Qix - MONICA WAS MISTREATED
  • 14,451
  • 16
  • 82
  • 145
Brian Quinn
  • 57
  • 1
  • 4
-1
{
    static void Main(string[] args)
    {
        List<Person> peopleList1 = new List<Person>();
        peopleList1.Add(new Person() { ID = 1 });
        peopleList1.Add(new Person() { ID = 2 });
        peopleList1.Add(new Person() { ID = 3 });

        List<Person> peopleList2 = new List<Person>();
        peopleList2.Add(new Person() { ID = 1 });
        peopleList2.Add(new Person() { ID = 2 });
        peopleList2.Add(new Person() { ID = 3 });
        peopleList2.Add(new Person() { ID = 4 });
        peopleList2.Add(new Person() { ID = 5 });
    }

    var leftPeeps = peopleList2.Where(x => !peopleList1.Select(y => y.ID).Contains(x.ID))?.ToList() ?? new List<Person>();
}

class Person
{
    public int ID { get; set; }
}

Notice the !peopleList1.Select(y => y.ID).Contains(x.ID) Select statement. This allows us to grab the indexer we want (ID) and see if it contains the ID of the previous list. ! means we don't want those. This may leave us with no entries. so, we can ensure we have something by checking for null and using a null coalesce.

Patrick Knott
  • 1,666
  • 15
  • 15