160

I have a class that is IComparable:

public class a : IComparable
{
    public int Id { get; set; }
    public string Name { get; set; }

    public a(int id)
    {
        this.Id = id;
    }

    public int CompareTo(object obj)
    {
        return this.Id.CompareTo(((a)obj).Id);
    }
}

When I add a list of object of this class to a hash set:

a a1 = new a(1);
a a2 = new a(2);
HashSet<a> ha = new HashSet<a>();
ha.add(a1);
ha.add(a2);
ha.add(a1);

Everything is fine and ha.count is 2, but:

a a1 = new a(1);
a a2 = new a(2);
HashSet<a> ha = new HashSet<a>();
ha.add(a1);
ha.add(a2);
ha.add(new a(1));

Now ha.count is 3.

  1. Why doesn't HashSet respect a's CompareTo method.
  2. Is HashSet the best way to have a list of unique objects?
nima
  • 6,566
  • 4
  • 45
  • 57
  • Add an implementation of `IEqualityComparer` in the constructor or implement it in the class `a`. https://msdn.microsoft.com/en-us/library/bb301504(v=vs.110).aspx – Jaider Aug 17 '16 at 20:23

5 Answers5

179

It uses an IEqualityComparer<T> (EqualityComparer<T>.Default unless you specify a different one on construction).

When you add an element to the set, it will find the hash code using IEqualityComparer<T>.GetHashCode, and store both the hash code and the element (after checking whether the element is already in the set, of course).

To look an element up, it will first use the IEqualityComparer<T>.GetHashCode to find the hash code, then for all elements with the same hash code, it will use IEqualityComparer<T>.Equals to compare for actual equality.

That means you have two options:

  • Pass a custom IEqualityComparer<T> into the constructor. This is the best option if you can't modify the T itself, or if you want a non-default equality relation (e.g. "all users with a negative user ID are considered equal"). This is almost never implemented on the type itself (i.e. Foo doesn't implement IEqualityComparer<Foo>) but in a separate type which is only used for comparisons.
  • Implement equality in the type itself, by overriding GetHashCode and Equals(object). Ideally, implement IEquatable<T> in the type as well, particularly if it's a value type. These methods will be called by the default equality comparer.

Note how none of this is in terms of an ordered comparison - which makes sense, as there are certainly situations where you can easily specify equality but not a total ordering. This is all the same as Dictionary<TKey, TValue>, basically.

If you want a set which uses ordering instead of just equality comparisons, you should use SortedSet<T> from .NET 4 - which allows you to specify an IComparer<T> instead of an IEqualityComparer<T>. This will use IComparer<T>.Compare - which will delegate to IComparable<T>.CompareTo or IComparable.CompareTo if you're using Comparer<T>.Default.

Jon Skeet
  • 1,421,763
  • 867
  • 9,128
  • 9,194
  • 7
    +1 Also note @tyriker's answer (that IMO should be a comment here) which points out that the simplest way to leverage said `IEqualityComparer.GetHashCode/Equals()` is to implement `Equals` and `GetHashCode` on `T` itself (and while you're doing that, you'd also implement the strongly typed counterpart:- `bool IEquatable.Equals(T other)` ) – Ruben Bartelink May 16 '13 at 09:05
  • 5
    Although very accurate this answer may be somewhat confusing, especially for new users as it doesn't clearly state that for the simplest case overriding `Equals` and `GetHashCode` is enough - as mentioned in @tyriker's answer. – BartoszKP Oct 02 '13 at 07:35
  • Imo once you implement `IComparable` (or `IComparer` for that matter) you shouldn't be asked to implement equality separately (but just `GetHashCode`). In a sense the comparability interfaces should inherit from equality interfaces. I do understand the performance benefits in having two separate functions (where you can optimize equality separately just by saying if something is equal or not) but still.. Very confusing otherwise when you have specified when the instances are equal in `CompareTo` function and framework wont consider that. – nawfal Jun 27 '15 at 15:03
  • @nawfal not everything has a logical order. if you're comparing two things that contain a bool property it is just plain awful to have to write something like `a.boolProp == b.boolProp ? 1 : 0` or should it be `a.boolProp == b.boolProp ? 0 : -1` or `a.boolProp == b.boolProp ? 1 : -1`. Yuk! – Simon_Weaver Feb 01 '17 at 01:42
  • 1
    @Simon_Weaver it is. I do want to somehow avoid it in my hypothetical feature I was proposing. – nawfal Feb 01 '17 at 03:24
  • @JonSkeet congrats on just hitting the 1 million rep. Why does your answer not work but @tyriker's does - ie. don't implement `IEqualityComparer`? I found with `IEqualityComparer` that `Equals(T a, T b)` was never called. But the overriden version is. – HankCa Feb 01 '18 at 01:04
  • @HankCa: Well are you actually passing the comparer into the constructor? It should work fine. I wouldn't have posted it if it didn't work. – Jon Skeet Feb 01 '18 at 06:56
  • @HankCa: I'm clarifying the answer to make that clearer though. – Jon Skeet Feb 01 '18 at 07:07
  • Ahh yes, thanks @JonSkeet. Its been a while since I've done this in Java (and I'm new to C#). What I was thinking of was Java's comparable where you implement a `int compareTo(T t)` and thinking C#'s `IComparable` was the same. `IComparable` is infact, just like Java's `Comparator` that like you said, is passed as an argument to the HashSet. – HankCa Feb 08 '18 at 03:33
  • To extend @Tyriker s "Instead of", you could have `public class a : IEqualityComparer {`, and then `new HashSet(a)` – HankCa Feb 08 '18 at 03:37
  • @HankCa: No, you'd almost never implement IEqualityComparer for the same class as is implementing it. You'd implement IEquatable instead. Also you don't pass a Comparator to a Java HashSet, for the same reasons... You'd pass one to a TreeSet instead, as that's ordered. – Jon Skeet Feb 08 '18 at 06:07
  • @HankCa: Comparator doesn't have compareTo, it has compare - it's important to distinguish between "I can compare two values" and "I can compare another value to myself" – Jon Skeet Feb 08 '18 at 06:08
  • From [your comment above](https://stackoverflow.com/questions/8952003/how-does-hashset-compare-elements-for-equality/15413983#comment84109697_8952026) it seems like it's required to pass the `IEqualityComparer` to the HashSet's constructor (I assume that that's what you meant, and not the Type's ctor). Why would that be needed? Wouldn't the HashSet call the `IEqualityComparer` anyway if the Type implements it? – ispiro Mar 19 '20 at 20:08
  • @ispiro: Which type? The element type? A type normally doesn't implement `IEqualityComparer` of itself, but of a *different* type. That's the difference between `IEqualityComparer` and `IEquatable`. The default comparer will use an `IEquatable` implementation, but the point is to be able to specify a different comparer if you want to. – Jon Skeet Mar 19 '20 at 20:24
  • @JonSkeet Thanks. So implementing `IEquatable` and overriding `Equals` and `GetHashCode` _will_ be enough without needing to pass an `IEqualityComparer` to the HashSet. Got it. Thanks again. – ispiro Mar 19 '20 at 21:51
  • @ispiro: For a type that you control, if you only want one type of equality comparison, yes. (Note that this is covered by the second bullet point in the answer.) – Jon Skeet Mar 20 '20 at 07:06
96

Here's clarification on a part of the answer that's been left unsaid: The object type of your HashSet<T> doesn't have to implement IEqualityComparer<T> but instead just has to override Object.GetHashCode() and Object.Equals(Object obj).

Instead of this:

public class a : IEqualityComparer<a>
{
  public int GetHashCode(a obj) { /* Implementation */ }
  public bool Equals(a obj1, a obj2) { /* Implementation */ }
}

You do this:

public class a
{
  public override int GetHashCode() { /* Implementation */ }
  public override bool Equals(object obj) { /* Implementation */ }
}

It is subtle, but this tripped me up for the better part of a day trying to get HashSet to function the way it is intended. And like others have said, HashSet<a> will end up calling a.GetHashCode() and a.Equals(obj) as necessary when working with the set.

Stefanvds
  • 5,868
  • 5
  • 48
  • 72
tyriker
  • 2,290
  • 22
  • 31
  • 2
    Good point. BTW as mentioned on my comment on @JonSkeet's answer, you should also implement `bool IEquatable.Equals(T other)` for a slight efficiency gain but more importantly the clarity benefit. For obv reasons, in addition to the need to implement `GetHashCode` alongside `IEquatable`, the doc for IEquatable mentions that for consistency purposes you should also override the `object.Equals` for consistency – Ruben Bartelink May 16 '13 at 09:09
  • I tried implementing this. The `ovveride getHashcode` works, but `override bool equals` gets the error: no method found to override. any idea? – Stefanvds Dec 12 '14 at 08:36
  • Finally the info i was looking for. Thank you. – Mauro Sampietro Jan 03 '17 at 15:38
  • From my comments on above answer - In your "Instead of" case, you could have `public class a : IEqualityComparer {`, and then `new HashSet(a)`. – HankCa Feb 08 '18 at 03:52
  • But see Jon Skeets comments above. – HankCa Feb 09 '18 at 12:10
  • Be careful! This will only work with the default `System.Collections.Generic.GenericEqualityComparer`. If your `HashSet` is materialized by Entity Framework, it will have `System.Data.Entity.Infrastructure.ObjectReferenceEqualityComparer`. – Monsignor Aug 29 '18 at 13:46
  • So is `IEqualityComparer` used by the dictionary or does override work for dictionary too ? – WDUK Apr 21 '22 at 06:07
  • @WDUK `IEqualityComparer` is used by dictionary, which doesn't inherently depend on `Object.Equals` by default, so override alone won't work. [source](https://learn.microsoft.com/en-us/dotnet/api/system.collections.generic.iequalitycomparer-1?view=net-6.0#remarks) – tyriker May 05 '22 at 17:12
  • You do understand that IEqualityComparer is not to be implemented by the object but actually is a separate class right? If the user can access the object code, then the interface to implement is IEquatable<>. – Lucas Montenegro Carvalhaes Mar 20 '23 at 19:00
15

HashSet uses Equals and GetHashCode().

CompareTo is for ordered sets.

If you want unique objects, but you don't care about their iteration order, HashSet<T> is typically the best choice.

CodesInChaos
  • 106,488
  • 23
  • 218
  • 262
7

constructor HashSet receive object what implement IEqualityComparer for adding new object. if you whant use method in HashSet you nead overrride Equals, GetHashCode

namespace HashSet
{
    public class Employe
    {
        public Employe() {
        }

        public string Name { get; set; }

        public override string ToString()  {
            return Name;
        }

        public override bool Equals(object obj) {
            return this.Name.Equals(((Employe)obj).Name);
        }

        public override int GetHashCode() {
            return this.Name.GetHashCode();
        }
    }

    class EmployeComparer : IEqualityComparer<Employe>
    {
        public bool Equals(Employe x, Employe y)
        {
            return x.Name.Trim().ToLower().Equals(y.Name.Trim().ToLower());
        }

        public int GetHashCode(Employe obj)
        {
            return obj.Name.GetHashCode();
        }
    }
    class Program
    {
        static void Main(string[] args)
        {
            HashSet<Employe> hashSet = new HashSet<Employe>(new EmployeComparer());
            hashSet.Add(new Employe() { Name = "Nik" });
            hashSet.Add(new Employe() { Name = "Rob" });
            hashSet.Add(new Employe() { Name = "Joe" });
            Display(hashSet);
            hashSet.Add(new Employe() { Name = "Rob" });
            Display(hashSet);

            HashSet<Employe> hashSetB = new HashSet<Employe>(new EmployeComparer());
            hashSetB.Add(new Employe() { Name = "Max" });
            hashSetB.Add(new Employe() { Name = "Solomon" });
            hashSetB.Add(new Employe() { Name = "Werter" });
            hashSetB.Add(new Employe() { Name = "Rob" });
            Display(hashSetB);

            var union = hashSet.Union<Employe>(hashSetB).ToList();
            Display(union);
            var inter = hashSet.Intersect<Employe>(hashSetB).ToList();
            Display(inter);
            var except = hashSet.Except<Employe>(hashSetB).ToList();
            Display(except);

            Console.ReadKey();
        }

        static void Display(HashSet<Employe> hashSet)
        {
            if (hashSet.Count == 0)
            {
                Console.Write("Collection is Empty");
                return;
            }
            foreach (var item in hashSet)
            {
                Console.Write("{0}, ", item);
            }
            Console.Write("\n");
        }

        static void Display(List<Employe> list)
        {
            if (list.Count == 0)
            {
                Console.WriteLine("Collection is Empty");
                return;
            }
            foreach (var item in list)
            {
                Console.Write("{0}, ", item);
            }
            Console.Write("\n");
        }
    }
}
pwb
  • 105
  • 1
  • 2
  • 8
Nikolai Nechai
  • 111
  • 1
  • 3
6

I came here looking for answers, but found that all the answers had too much info or not enough, so here is my answer...

Since you've created a custom class you need to implement GetHashCode and Equals. In this example I will use a class Student instead of a because it's easier to follow and doesn't violate any naming conventions. Here is what the implementations look like:

public override bool Equals(object obj)
{
    return obj is Student student && Id == student.Id;
}

public override int GetHashCode()
{
    return HashCode.Combine(Id);
}

I stumbled across this article from Microsoft that gives an incredibly easy way to implement these if you're using Visual Studio. In case it's helpful to anyone else, here are complete steps for using a custom data type in a HashSet using Visual Studio:

Given a class Student with 2 simple properties and an initializer

public class Student
{
    public int Id { get; set; }
    public string Name { get; set; }

    public Student(int id)
    {
        this.Id = id;
    }
 }

To Implement IComparable, add : IComparable<Student> like so:

public class Student : IComparable<Student>

You will see a red squiggly appear with an error message saying your class doesn't implement IComparable. Click on suggestions or press Alt+Enter and use the suggestion to implement it.

use the suggestion to implement IComparable

You will see the method generated. You can then write your own implementation like below:

public int CompareTo(Student student)
{
    return this.Id.CompareTo(student.Id);
}

In the above implementation only the Id property is compared, name is ignored. Next right-click in your code and select Quick actions and refactorings, then Generate Equals and GetHashCode

Generate Equals and GetHashCode

A window will pop up where you can select which properties to use for hashing and even implement IEquitable if you'd like:

pop up where you can select which properties to use for hashing

Here is the generated code:

public class Student : IComparable<Student>, IEquatable<Student> {
    ...
    public override bool Equals(object obj)
    {
        return Equals(obj as Student);
    }

    public bool Equals(Student other)
    {
        return other != null && Id == other.Id;
    }

    public override int GetHashCode()
    {
        return HashCode.Combine(Id);
    }
}

Now if you try to add a duplicate item like shown below it will be skipped:

static void Main(string[] args)
{
    Student s1 = new Student(1);
    Student s2 = new Student(2);
    HashSet<Student> hs = new HashSet<Student>();

    hs.Add(s1);
    hs.Add(s2);
    hs.Add(new Student(1)); //will be skipped
    hs.Add(new Student(3));
}

You can now use .Contains like so:

for (int i = 0; i <= 4; i++)
{
    if (hs.Contains(new Student(i)))
    {
        Console.WriteLine($@"Set contains student with Id {i}");
    }
    else
    {
        Console.WriteLine($@"Set does NOT contain a student with Id {i}");
    }
}

Output:

Console output

SendETHToThisAddress
  • 2,756
  • 7
  • 29
  • 54
  • 1
    Fantastic thank you. I was struggling a bit with the other answers and then as you point out, it's built into Visual Studio anyway – mejobloggs Feb 06 '22 at 06:46