20

What's the simplest way to perform a set subtraction given two arrays in C#? Apparently this is dead easy in Ruby. Basically I just want to remove the elements from array a that are in array b:

string[] a = new string[] { "one", "two", "three", "four" };
string[] b = new string[] { "two", "four", "six" };
string[] c = a - b; // not valid

c should equal { "one", "three" }. b - a would yield { "six" }.

skaffman
  • 398,947
  • 96
  • 818
  • 769
devios1
  • 36,899
  • 45
  • 162
  • 260

2 Answers2

42

If you're using Linq, you can use the Except operator like this:

string [] c = a.Except(b).ToArray();

Edit: CodeInChaos makes a good point. If a contains duplicates, it will remove any duplicates as well. The alternative to make it function exactly like the Ruby version would be this:

string [] c = a.Where(x=>!b.Contains(x)).ToArray();
Keltex
  • 26,220
  • 11
  • 79
  • 111
  • 5
    Note that this will only return unique elements. So if you have one element twice in `a` it will only keep the first one. – CodesInChaos Feb 20 '11 at 17:25
  • Excellent, thanks--don't know how I missed that. Guess the naming just threw me off. – devios1 Feb 20 '11 at 17:27
  • 1
    The alternative will remove duplicates! – xanatos Feb 20 '11 at 17:32
  • Note that your alternative code is `O(a.Length*b.Length)`. To make it fast you need to construct a `HashSet` out of `b`. – CodesInChaos Feb 20 '11 at 17:33
  • @CodeInChaos - True, but I'm not trying to optimize, just show the syntax. Ideally you would write your own extension `ExceptLikeRuby` or something that would wrap this functionality. – Keltex Feb 20 '11 at 17:35
  • @Keltez Your alternative will still remove duplicates. If you have { 'A', 'A' } - {'A'} == { } – xanatos Feb 20 '11 at 17:51
  • 2
    @xanatos - That's the proper behavior, to remove all elements from A that are in B. On the other hand my alternative will do {'B','B'} - {'A'} = {'B','B'} – Keltex Feb 20 '11 at 17:53
4
public static IEnumerable<T> Minus<T>(this IEnumerable<T> enum1, IEnumerable<T> enum2)
{
    Dictionary<T, int> elements = new Dictionary<T, int>();

    foreach (var el in enum2)
    {
        int num = 0;
        elements.TryGetValue(el, out num);
        elements[el] = num + 1;
    }

    foreach (var el in enum1)
    {
        int num = 0;
        if (elements.TryGetValue(el, out num) && num > 0)
        {
            elements[el] = num - 1;
        }
        else
        {
            yield return el;
        }
    }
}

This won't remove duplicates from enum1. To be clear:

  1. { 'A', 'A' } - { 'A' } == { 'A' }
  2. { 'A', 'A' } - { 'A' } == { }

I do the first, Enumerable.Except does the second.

xanatos
  • 109,618
  • 12
  • 197
  • 280
  • `elements.Remove` should be `elements.Contains` – Keltex Feb 20 '11 at 17:50
  • @Keltex, the idea is that you want to remove it because it's been counted already. However, I believe it should be a `List` instead of a `HashSet` because it otherwise won't account for `{a, a, a}` - `{a, a}` = `{a}`. – devios1 Feb 20 '11 at 17:57
  • No, it should be a Dictionary, so I'm still in a O(m * log(n)) (technically probably O((m + 1) * log(n)) ) – xanatos Feb 20 '11 at 18:03
  • With Linq, you can just do `var elements = enum2.GroupBy(el => el).ToDictionary(elg => elg.Key, elg => elg.Count());`. – NetMage Feb 11 '21 at 18:37