3

I am storing a 2D array that represents a distance matrix between vectors as a Dictionary<DistanceCell, double>. My implementation of DistanceCell has two string fields representing the vectors being compared.

class DistanceCell
{
    public string Group1 { get; private set; }
    public string Group2 { get; private set; }

    public DistanceCell(string group1, string group2)
    {
        if (group1 == null)
        {
            throw new ArgumentNullException("group1");
        }

        if (group2 == null)
        {
            throw new ArgumentNullException("group2");
        }

        this.Group1 = group1;
        this.Group2 = group2;
    }
}

Since I'm using this class as a key, I overrode Equals() and GetHashCode():

public override bool Equals(object obj)
{
    // False if the object is null
    if (obj == null)
    {
        return false;
    }

    // Try casting to a DistanceCell. If it fails, return false;
    DistanceCell cell = obj as DistanceCell;
    if (cell == null)
    {
        return false;
    }

    return (this.Group1 == cell.Group1 && this.Group2 == cell.Group2) 
           || (this.Group1 == cell.Group2 && this.Group2 == cell.Group1);
}

public bool Equals(DistanceCell cell)
{
    if (cell == null)
    {
        return false;
    }

    return (this.Group1 == cell.Group1 && this.Group2 == cell.Group2) 
           || (this.Group1 == cell.Group2 && this.Group2 == cell.Group1);
}

public static bool operator ==(DistanceCell a, DistanceCell b)
{
    // If both are null, or both are same instance, return true.
    if (System.Object.ReferenceEquals(a, b))
    {
        return true;
    }

    // If either is null, return false.
    // Cast a and b to objects to check for null to avoid calling this operator method
    // and causing an infinite loop.
    if ((object)a == null || (object)b == null)
    {
        return false;
    }

    return (a.Group1 == b.Group1 && a.Group2 == b.Group2) 
           || (a.Group1 == b.Group2 && a.Group2 == b.Group1);
}

public static bool operator !=(DistanceCell a, DistanceCell b)
{
    return !(a == b);
}

public override int GetHashCode()
{
    int hash;
    unchecked
    {
        hash = Group1.GetHashCode() * Group2.GetHashCode();
    }

    return hash;
}

As you can see, one of the requirements of DistanceCell is that Group1 and Group2 are interchangeable. So for two strings x and y, DistanceCell("x", "y") must equal DistanceCell("y", "x"). This is why I implemented GetHashCode() with multiplication because DistanceCell("x", "y").GetHashCode() must equal DistanceCell("y", "x").GetHashCode().

The problem that I'm having is that it works correctly roughly 90% of the time, but it throws a KeyNotFoundException or a NullReferenceException the remainder of the time. The former is thrown when getting a key from the dictionary, and the latter when I iterate over the dictionary with a foreach loop and it retrieves a key that is null that it then tries to call Equals() on. I suspect this has something to do with an error in my GetHashCode() implementation, but I'm not positive. Also note that there should never be a case where the key wouldn't exist in the dictionary when I check it because of the nature of my algorithm. The algorithm takes the same path every execution.

Update

I just wanted to update everyone that the problem is fixed. It turns out that it had nothing to do with my Equals() or GetHashCode() implementation. I did some extensive debugging and found that the reason I was getting a KeyNotFoundException was because the key didn't exist in the dictionary in the first place, which was strange because I was positive that it was being added. The problem was that I was adding the keys to the dictionary with multiple threads, and according to this, the c# Dictionary class isn't thread-safe. So the timing must have been perfect that an Add() failed and thus the key was never added to the dictionary. I believe this can also explain how the foreach loop was bringing up a null key on occasion. The Add()'s from multiple threads must have messed up some internal structures in the dictionary and introduced a null key.

Thanks to every one for your help! I'm sorry that it ended up being a fault entirely on my end.

StrangerLoop
  • 39
  • 1
  • 4
  • Your `DistanceCell` class is not `sealed`. Are you sure no-one derives from `DistanceCell`? Your `Equals` methods don't check if `obj` or `cell` is of a _more_ derived type than `this` (or vice versa). You should include those checks, or mark your class `sealed`. – Jeppe Stig Nielsen Jan 20 '13 at 21:09
  • @Jeppe I don't have any classes that derive from `DistanceCell`, so this shouldn't be a problem, but I'll mark it as `sealed` to be safe. – StrangerLoop Jan 20 '13 at 21:16
  • I had this problem, but the actual reason was that i had two different types of spaces, caused by copy-pasting the code in and out of an excel cell. – will Oct 03 '16 at 11:37

3 Answers3

4

Take a look at this blog post by Eric Lippert

It says that the result of GetHashCode should never change even if you change the content of the object. The reason is that Dictionary is using buckets for faster indexing. Once you change the GetHasCode result, the Dictionary will not be able to find the right bucket for your object and this can lead to "Key not found"

I might be wrong, but it is worth testing.

Ondra
  • 1,619
  • 13
  • 27
  • This is where I would start looking. `GetHashCode()` should __never__ compute based on _mutable_ fields. Actually, I would avoid overriding it unless you are 110% sure what you are doing. The default is an auto-incremented value that should work just fine. – Erik Jan 20 '13 at 20:22
  • This might be the correct answer. Even if the setters of the properties are `private`. The question is: Are `Group1` and `Group2` ever changed outside the constructor? It looks as if the asker showed us the full code of the class? – Jeppe Stig Nielsen Jan 20 '13 at 20:28
  • 1
    @SiLo My `DistanceCell` object shouldn't be mutable though. The only fields that it has are the two strings `Group1` and `Group2`, which are only set when the constructor is called. – StrangerLoop Jan 20 '13 at 20:32
  • @Jeppe Yes, this was the full code of my `DistanceCell` class. – StrangerLoop Jan 20 '13 at 20:33
  • @StrangerLoop Oh, you are correct. I misread them as custom classes named `Group1` and `Group2`, assuming they might have their own `GetHashCode()` implementations. My apologies, I'll take another look. That being said, I'd still recommend you remove your `GetHashCode()` override for the `DistanceCell` class. – Erik Jan 20 '13 at 20:35
  • @SiLo I can't remove it because one of the requirements I have is that for two strings `"x"` and `"y"`, `DistanceCell("x","y")` and `DistanceCell("y","x")` must be equal and return the same hash code. – StrangerLoop Jan 20 '13 at 20:49
2

I believe what is missing for you is this answer Using an object as a generic Dictionary key

You probably not stating you implement IEquatable interface

Community
  • 1
  • 1
shahar eldad
  • 861
  • 1
  • 12
  • 31
0

To me your code looks correct. (It can possibly be (micro-)optimized, but it looks as if it will be correct in all cases.) Can you provide some code where you create and use the Dictionary<DistanceCell, double>, and things go bad? The problem may lie in code you haven't shown us.

Jeppe Stig Nielsen
  • 60,409
  • 11
  • 110
  • 181
  • The code is quite long and there's a lot of steps that happen to calculate the vectors before I use the distance matrix. I am going to do some more testing, and if I can't find anything, I'll post code that could replicate this issue. – StrangerLoop Jan 20 '13 at 21:37
  • 1
    You were right, the code here was correct. The problem was that I was adding to the dictionary from multiple threads which caused a key to be missing on occasion. Thanks for all your attempts at helping me figure this out! – StrangerLoop Jan 21 '13 at 17:08
  • Ah, someone should have mentioned "`Dictionary<,>` is not thread-safe", but a good thing you found out (the hard way). – Jeppe Stig Nielsen Jan 21 '13 at 21:54