1

I have a IEnumerable of objects that have redefined GetHashCode method. I assumed that if I add those objects to HashSet<T>, it would hold only the unique objects. But it doesn't:

var set = new HashSet<SomeObject>();
Count = 0
set.Add(first);
true
set.Add(second);
true
set.Count
2
first.GetHashCode()
-927637658
second.GetHashCode()
-927637658

So how could I reduce my IEnumerable structure of objects to those that are unique based on their GetHashCode() value.

Although I don't know if this helps in any way:

public class SomeObject
{
    ...
    public string GetAggregateKey()
    {
        var json = ToJson();
        json.Property("id").Remove();
        return json.ToString(); // without the `id`, the json string of two separate objects with same content could be the same
    }

    override public int GetHashCode()
    {
        // two equal strings have same hash code
        return GetAggregateKey().GetHashCode();
    }
    ...
}
ddinchev
  • 33,683
  • 28
  • 88
  • 133
  • 2
    We need to see the implementation of your `SampleObject` type, especially `GetHashCode` (and `Equals`). – Christian.K Nov 05 '15 at 09:29
  • @Christian.K: don't see why that would be needed. – Thomas Weller Nov 05 '15 at 09:32
  • Because your implementation of `GetHashCode` might not be correct? – Christian.K Nov 05 '15 at 09:32
  • You could use a `Dictionary` and store the hash code as the key. – Thomas Weller Nov 05 '15 at 09:32
  • As I understand the question, it is by purpose that the hash code generates collisions. – Thomas Weller Nov 05 '15 at 09:34
  • Well, my `GetHashCode()` returns an int and HashSet doesn't "ignore" second object with same int from GetHashCode - isn't that enough? I don't have `Equals` implementation though. @Thomas yes, the idea is to generate collisions when objects have same content. – ddinchev Nov 05 '15 at 09:35
  • 4
    The hash code alone is used to quickly find potential candidates, but the final say is done by the `Equals` method. How is that defined? – Lasse V. Karlsen Nov 05 '15 at 09:35
  • @LasseV.Karlsen aaah, okay, that seems to be the potential problem. Could you add it as answer :) – ddinchev Nov 05 '15 at 09:36
  • 1
    Maybe this is the answer: http://stackoverflow.com/a/8952026/226277 – Stéphane Bebrone Nov 05 '15 at 09:37
  • 1
    Using a hashcode as a measure of uniqueness is misguided. Even if your implementation returns the same value for every object, this will simply cause hash table implementations to operate highly inefficiently, not incorrectly. You must also provide equality members as it is these that are used by hash table to differentiate between items that return the same hashcode. One without the other is an incomplete implementation in your code. Suggest you study how hash tables actually work. – spender Nov 05 '15 at 09:39
  • ... So think of the hashcode as a quick and dirty way to measure equality. A different hashcode means that the objects are unequal, but the same hashcode does not guarantee that the objects are equal, and it's at this point that you'd fall back on the equality members to get a definitive answer. – spender Nov 05 '15 at 09:42
  • See also: [Guidelines and rules for GetHashCode](http://blogs.msdn.com/b/ericlippert/archive/2011/02/28/guidelines-and-rules-for-gethashcode.aspx) – Corak Nov 05 '15 at 09:43

2 Answers2

4

It is not enough to only have a GetHashCode method.

The GetHashCode method is used to quickly figure out if there are potential candidates already in the hashset (or dictionary):

  • If no existing object in the hashset has the same hash code, the new one is not a duplicate
  • If any existing object(s) in the hashset has the same hash code, the new one is a potential duplicate

To figure out if it is just a potential duplicate or an actual duplicate, Equals is used.

If you haven't implemented that then the object.Equals method will be used, which is simply comparing references. Two distinct objects will thus never be equal, even though they may both have the same property values and the same hash code.

The solution: Implement Equals with the same rules as the GetHashCode, or provide a IEqualityComparer<T> implementation to your hashset.

Lasse V. Karlsen
  • 380,855
  • 102
  • 628
  • 825
1

Have a look at the Reference Source for HashSet: This line (960, and those around it) is what you're looking for:

if (m_slots[i].hashCode == hashCode && m_comparer.Equals(m_slots[i].value, value))

The hash of the object is only used to decide which bucket the object goes into. If Equals returns false for the two objects, the new one will still be inserted.

germi
  • 4,628
  • 1
  • 21
  • 38