5

Is this the best way to create a comparer for the equality of two dictionaries? This needs to be exact. Note that Entity.Columns is a dictionary of KeyValuePair(string, object) :

public class EntityColumnCompare : IEqualityComparer<Entity>
{
    public bool Equals(Entity a, Entity b)
    {
        var aCol = a.Columns.OrderBy(KeyValuePair => KeyValuePair.Key);
        var bCol = b.Columns.OrderBy(KeyValuePAir => KeyValuePAir.Key); 

        if (aCol.SequenceEqual(bCol))
            return true;
        else
            return false;           
    }

    public int GetHashCode(Entity obj)
    {
        return obj.Columns.GetHashCode(); 
    }
}

Also not too sure about the GetHashCode implementation.

Thanks!

Sean Thoman
  • 7,429
  • 6
  • 56
  • 103

3 Answers3

8

Here's what I would do:

    public bool Equals(Entity a, Entity b)
    {
        if (a.Columns.Count != b.Columns.Count)
            return false; // Different number of items

        foreach(var kvp in a.Columns)
        {
            object bValue;
            if (!b.Columns.TryGetValue(kvp.Key, out bValue))
                return false; // key missing in b
            if (!Equals(kvp.Value, bValue))
                return false; // value is different
        }
        return true;
    }

That way you don't need to order the entries (which is a O(n log n) operation) : you only need to enumerate the entries in the first dictionary (O(n)) and try to retrieve values by key in the second dictionary (O(1)), so the overall complexity is O(n).

Also, note that your GetHashCode method is incorrect: in most cases it will return different values for different dictionary instances, even if they have the same content. And if the hashcode is different, Equals will never be called... You have several options to implement it correctly, none of them ideal:

  • build the hashcode from the content of the dictionary: would be the best option, but it's slow, and GetHashCode needs to be fast
  • always return the same value, that way Equals will always be called: very bad if you want to use this comparer in a hashtable/dictionary/hashset, because all instances will fall in the same bucket, resulting in O(n) access instead of O(1)
  • return the Count of the dictionary (as suggested by digEmAll): it won't give a great distribution, but still better than always returning the same value, and it satisfies the constraint for GetHashCode (i.e. objects that are considered equal should have the same hashcode; two "equal" dictionaries have the same number of items, so it works)
Thomas Levesque
  • 286,951
  • 70
  • 623
  • 758
  • Actually, I don't think one can do much more in GetHashCode() implementation for dictionaries or in general for lists. One could compute the hash e.g. using the first n elements: this will improve the distribuition by a n factor (i.e. buckets averagely n-times smaller), but also it will increase GetHashCode complexity by n. So basically no advantage... – digEmAll Mar 23 '11 at 21:41
  • @digEmAll, don't discard that idea so soon... I didn't do the math, but I think it might work – Thomas Levesque Mar 23 '11 at 22:04
  • of course I did my math very roughly, but I suspect that's quite correct... BTW, this argument could deserve a question :) – digEmAll Mar 23 '11 at 22:19
  • Another approach if you control all changes to the dictionary would be to keep a cumulatively-updated hashcode. For example, you could keep four int's holding the sum and 'xor' of the hashcodes of all the keys in the dictionary, and of all the values in the dictionary. The dictionary's GetHashCode would return a munge of those four int values. – supercat Mar 25 '11 at 19:42
2

Something like this comes to mind, but there might be something more efficient:

public static bool Equals<TKey, TValue>(IDictionary<TKey, TValue> x, 
    IDictionary<TKey, TValue> y)
{
    return x.Keys.Intersect(y.Keys).Count == x.Keys.Count &&
        x.Keys.All(key => Object.Equals(x[key], y[key]));
}
Richard Szalay
  • 83,269
  • 19
  • 178
  • 237
1

It seems good to me, perhaps not the fastest but working.

You just need to change the GetHashCode implementation that is wrong.

For example you could return obj.Columns.Count.GetHashCode()

digEmAll
  • 56,430
  • 9
  • 115
  • 140
  • 1
    @Sean: It's exactly the same, in fact the implementation of `int.GetHashCode()` it's: `returns this;`. You can leave `GetHashCode()` because it's stylistically better IMO, and probably JIT compiler will inline the code saving the call to GetHashCode :) – digEmAll Mar 23 '11 at 21:46