Complement of Two Lists?

Question

lets say I have a list of strings:

A,B,C,D

Then another list of strings

B,C,D

I want to know what elements are in the first list that aren't in the second list, so the result would be A

I don't know the name of the extension method to do this is. I know I can use concat, union, intersect for similar list comparisons, but just don't know the name to accomplish this particular task.

Addendum, I am interested in duplicates, so if the first list is:

A,A,A,B,C,D

and the second list is

B,C,D

i want to get

A,A,A

Thanks!

Use sets if you do this with more than a few times with small lists. That's not only the more appropriate approach, also a thousand times better complexity-wise. — , May 09 '11 at 15:57

score 20 · Accepted Answer · answered May 09 '11 at 15:58

20

You can use the Except Extension Method to get all elements in a list that are not in a second list:

var result = list1.Except(list2);

answered May 09 '11 at 15:58

dtb

213,145
36
401
431

score 4 · Answer 2 · answered May 09 '11 at 15:57

4

var result = list1.Where(i => !list2.Contains(i));

answered May 09 '11 at 15:57

Bala R

107,317
23
199
210

There's not a built in extension method for this? – sooprise May 09 '11 at 15:57
You can build an extension method that does this. – Chase Florell May 09 '11 at 15:59
I see, this is significantly slower than the built in Except, but since it contains duplicates, I will use it (for now) Thanks! :) – sooprise May 09 '11 at 16:11
are you saying the processing time is slower? – Chase Florell May 09 '11 at 16:18
Yea, it takes much longer to process. The fact that there are two union methods for duplicates, and no duplicates throws me off. Shouldn't there be two methods for duplicates and no duplicates for except as well? – sooprise May 09 '11 at 16:20
@Rock: This is `O(n^2)`, whereas `Except` is `O(n)`. – SLaks May 09 '11 at 16:20

score 4 · Answer 3 · answered May 09 '11 at 18:34

The "Except" method in the BCL removes all duplicates, which is not what you want.

If the lists in the problem are large, then to do this efficiently you probably want to waste memory in exchange for saving on time. Something like:

// yield all members of "sequence" omitting those in "except"
static IEnumerable<string> Filter(
    this IEnumerable<string> sequence, 
    IEnumerable<string> except)
{
    var set = new HashSet<string>(except); // Burn memory to save time
    return from item in sequence 
           where !set.Contains(item) 
           select item;
}

That way you get fast lookup every time you test an item.

Call it with

var sequence = new List<string>() { A, B, A, C, D };
var except = new List<string>() { B, C };
var result = sequence.Filter(except).ToList();

Note that if the first list is A,A,B and the second list A, this will return B. — Brian, May 09 '11 at 20:39

score 0 · Answer 4 · answered May 10 '11 at 03:21

If your definition of duplicates includes both lists and you want to compute the complement efficiently, then you will need to use a different data structure: a bag. A bag is a set that allows duplicates.

Here is an extension method called BagDifference that efficiently accounts for duplicates in either list together with a sample program inspired by Eric's answer.

public class Bag<T> : Dictionary<T, int>
{
    public Bag(IEnumerable<T> sequence)
    {
        foreach (var item in sequence)
        {
            if (!ContainsKey(item)) this[item] = 0;
            ++this[item];
        }
    }
}

public static class EnumerableExtensions
{
    public static IEnumerable<T> BagDifference<T>(this IEnumerable<T> sequence1, IEnumerable<T> sequence2)
    {
        var bag1 = new Bag<T>(sequence1);
        var bag2 = new Bag<T>(sequence2);
        foreach (var item in bag1.Keys)
        {
            var count1 = bag1[item];
            var count2 = bag2.ContainsKey(item) ? bag2[item] : 0;
            var difference = Math.Max(0, count1 - count2);
            for (int i = 0; i < difference; i++)
                yield return item;
        }
    }
}

class Program
{

    static void Main(string[] args)
    {
        var sequence = new List<string>() { "A", "B", "A", "C", "D" };
        var except = new List<string>() { "A", "B", "C", "C" };
        var difference = sequence.BagDifference(except).ToList();
    }
}

Complement of Two Lists?

4 Answers4

Linked