20

I've got a HashSet,

var universe = new HashSet<int>();

And a bunch of subsets,

var sets = new List<HashSet<int>>(numSets);

I want to subtract a chunk, which I can do like this:

var remaining = universe.ExceptWith(sets[0]);

But ExceptWith works in-place. I don't want to modify the universe. Should I clone it first, or is there a better way?

mpen
  • 272,448
  • 266
  • 850
  • 1,236

6 Answers6

19

I guess I should clone it first? How do I do that?

var universe = new HashSet<int>();
var subset = new HashSet<int>();
...

// clone the universe
var remaining = new HashSet<int>(universe);
remaining.ExceptWith(subset);

Not as simple as with the Except extension method, but probably faster (you should run a few performance tests to make sure)

Thomas Levesque
  • 286,951
  • 70
  • 623
  • 758
  • 1
    Unfortunately, the [`new HashSet(IEnumerable)`](https://msdn.microsoft.com/en-us/library/bb301504.aspx) you are using does not make use of the fact that the existing set only contains distinct elements and calls the expensive 'Add(item)` method for every single item, rather than efficiently shallow-cloning the internal data structures, i.e. with increasingly large `universe`s, this gets much slower than it could be. Hence: +1 for your follow-up question: [Efficient way to clone a HashSet?](http://stackoverflow.com/q/3927789/709537) – Evgeniy Berezovsky Jul 03 '15 at 02:10
12

How about Except()?

var x = new HashSet<int>();
var y = new HashSet<int>();

var xminusy = new HashSet<int>(x.Except(y));
James McNellis
  • 348,265
  • 75
  • 913
  • 977
  • But `Except` is an extension method, `ExceptWith` is specifically built to work with `HashSets`... is this just as efficient? – mpen Oct 09 '10 at 19:47
  • 1
    @Mark, it's definitely less efficient than *just* `ExceptWith`, but it's about as efficient as *cloning* it first and then calling `ExceptWith`. – Kirk Woll Oct 10 '10 at 00:02
  • 4
    @Kirk: Finally got around to testing this. Not true. It's still about 40% slower. http://programanddesign.com/cs/subtracting-sets/ – mpen Dec 31 '10 at 05:08
  • 1
    @Ralph: Very interesting. This is why I usually stick to answering C++ questions ;-) – James McNellis Dec 31 '10 at 05:11
  • 1
    Except is superior in the scenario where you need an IEqualityComparer (imagine using 2 sets of serialized objects, where the object's contents are identical, but their HashCode is not), unfortunately ExceptWith does not support a custom IEqualityComparer, but the Except extension method does. In the case of integers like here, that is ofc not an issue. – Joris Feb 14 '11 at 00:45
  • @Joris: But a `HashSet`'s constructor accepts a custom `IEqualityComparer`. – StriplingWarrior Apr 08 '19 at 18:04
11

I benchmarked Linq's Except method against cloning and using the HashSet-native function ExceptWith. Here are the results.

static class Program
{
    public static HashSet<T> ToSet<T>(this IEnumerable<T> collection)
    {
        return new HashSet<T>(collection);
    }

    public static HashSet<T> Subtract<T>(this HashSet<T> set, IEnumerable<T> other)
    {
        var clone = set.ToSet();
        clone.ExceptWith(other);
        return clone;
    }

    static void Main(string[] args)
    {
        var A = new HashSet<int> { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 };
        var B = new HashSet<int> { 2, 4, 6, 8, 10 };
        var sw = new Stopwatch();

        sw.Restart();
        for (int i = 0; i < 1000000; ++i)
        {
            var C = A.Except(B).ToSet();
        }
        sw.Stop();
        Console.WriteLine("Linq: {0} ms", sw.ElapsedMilliseconds);

        sw.Restart();
        for (int i = 0; i < 1000000; ++i)
        {
            var C = A.Subtract(B);
        }
        sw.Stop();
        Console.WriteLine("Native: {0} ms", sw.ElapsedMilliseconds);

        Console.ReadLine();
    }
}

Linq: 1297 ms
Native: 762 ms

http://programanddesign.com/cs/subtracting-sets/

mpen
  • 272,448
  • 266
  • 850
  • 1,236
1

A hash set has to track its hash algorithm constants, and its overflow bins. The elements in the set are held by reference. Creating a new hash with the copy constructor, as Thomas Levesque suggests, creates a shallow copy of this overhead and should be quite fast. Using Except() in the way that James McNellis suggests first creates an anonymous copy and then passes that to the copy constructor which uses the fields in the anonymous to initialize its own fields. As Thomas said, you might do a few performance tests, but theoretically his answer should beat James' answer. And by the way, to my way of thinking, a shallow copy is not a clone since I believe a clone implies that the underlying elements are also copied. Hash sets with common elements use a copy when modified strategy.

verisimilidude
  • 738
  • 3
  • 8
  • Yeah.. you're right, I don't think I need a deep copy anyway. Using int's in this example, but they'll be classes in practice; a reference is fine though. – mpen Oct 10 '10 at 08:57
0

Very late answer but maybe useful sometimes.

@mpen answered by using Linq's Except(IEnumerable<>)

Which make linq loop trough IEnumerable check if it's contains.

How about

setA.Where(i => !setB.Contains(i))

kitta
  • 1,723
  • 3
  • 23
  • 33
0

Obviously in a few cases 'manually' adding items in a loop is more efficient than copying the whole set and then removing items. One I can think of ...

// no more set ops planned? then returning list is an option
public static List<T> ExceptWith<T>(HashSet<T> allObjects, Hashset<T> minus)
{
    //  Set Capacity of list   (allObjects.Count-minus.Count?)
    List<T> retlst = new List<T>(allObjects.Count); 

    foreach( var obj in allObjects) {
        if( minus.Contains(obj)==false)
            retlst.Add(obj);
    }
    return retlst;
}

// Special case where quantity of copying will be high
// more expensive in fact than just adding
public static HashSet<T> ExceptWith<T>(HashSet<T> allObjects, HashSet<T> minus)
{
    if( minus.Count > allObjects.Count * 7/8 )
    {
        HashSet<T> retHash = new HashSet<T>(); 

        foreach( var obj in allObjects) {
            if( minus.Contains(obj)==false)
                retHash.Add(obj);
        }
        return retHash;

    }
    else
    {
        // usual clone and remove
    }
}
PJJ
  • 61
  • 5