8

lets say I have a list of strings:

A,B,C,D

Then another list of strings

B,C,D

I want to know what elements are in the first list that aren't in the second list, so the result would be A

I don't know the name of the extension method to do this is. I know I can use concat, union, intersect for similar list comparisons, but just don't know the name to accomplish this particular task.

Addendum, I am interested in duplicates, so if the first list is:

A,A,A,B,C,D

and the second list is

B,C,D

i want to get

A,A,A

Thanks!

sooprise
  • 22,657
  • 67
  • 188
  • 276
  • 1
    Use sets if you do this with more than a few times with small lists. That's not only the more appropriate approach, also a thousand times better complexity-wise. –  May 09 '11 at 15:57
  • Thanks, the problem involves thousands of rows – sooprise May 09 '11 at 16:06

4 Answers4

20

You can use the Except Extension Method to get all elements in a list that are not in a second list:

var result = list1.Except(list2);
dtb
  • 213,145
  • 36
  • 401
  • 431
4
var result = list1.Where(i => !list2.Contains(i));
Bala R
  • 107,317
  • 23
  • 199
  • 210
4

The "Except" method in the BCL removes all duplicates, which is not what you want.

If the lists in the problem are large, then to do this efficiently you probably want to waste memory in exchange for saving on time. Something like:

// yield all members of "sequence" omitting those in "except"
static IEnumerable<string> Filter(
    this IEnumerable<string> sequence, 
    IEnumerable<string> except)
{
    var set = new HashSet<string>(except); // Burn memory to save time
    return from item in sequence 
           where !set.Contains(item) 
           select item;
}

That way you get fast lookup every time you test an item.

Call it with

var sequence = new List<string>() { A, B, A, C, D };
var except = new List<string>() { B, C };
var result = sequence.Filter(except).ToList();
Eric Lippert
  • 647,829
  • 179
  • 1,238
  • 2,067
0

If your definition of duplicates includes both lists and you want to compute the complement efficiently, then you will need to use a different data structure: a bag. A bag is a set that allows duplicates.

Here is an extension method called BagDifference that efficiently accounts for duplicates in either list together with a sample program inspired by Eric's answer.

public class Bag<T> : Dictionary<T, int>
{
    public Bag(IEnumerable<T> sequence)
    {
        foreach (var item in sequence)
        {
            if (!ContainsKey(item)) this[item] = 0;
            ++this[item];
        }
    }
}

public static class EnumerableExtensions
{
    public static IEnumerable<T> BagDifference<T>(this IEnumerable<T> sequence1, IEnumerable<T> sequence2)
    {
        var bag1 = new Bag<T>(sequence1);
        var bag2 = new Bag<T>(sequence2);
        foreach (var item in bag1.Keys)
        {
            var count1 = bag1[item];
            var count2 = bag2.ContainsKey(item) ? bag2[item] : 0;
            var difference = Math.Max(0, count1 - count2);
            for (int i = 0; i < difference; i++)
                yield return item;
        }
    }
}

class Program
{

    static void Main(string[] args)
    {
        var sequence = new List<string>() { "A", "B", "A", "C", "D" };
        var except = new List<string>() { "A", "B", "C", "C" };
        var difference = sequence.BagDifference(except).ToList();
    }
}
Rick Sladkey
  • 33,988
  • 6
  • 71
  • 95