3

I have implemented the class below:

public class carComparer : IEqualityComparer<Car>
    {
        public bool Equals(Car car1, Car car2)
        {
                if (car1 == null || car2 == null)
                    return false;

                return (car1.description == car2.description);
        }

        public int GetHashCode(Car car)
        {
            unchecked 
            {
                int hash = 17;
                hash = hash * 29 + car.id.GetHashCode();
                hash = hash * 29 + car.description.GetHashCode();
                return hash;
            }
        }

    }

Now see this:

Car p1 = new Car() { id = Guid.NewGuid(), description = "Test1" };
        Car p2 = new Car() { id = Guid.NewGuid(), description = "Test1" };
        Car p3 = new Car() { id = Guid.NewGuid(), description = "Test1" };
        Car p4 = new Car() { id = Guid.NewGuid(), description = "Test1" };
        var hash = new HashSet<Car>();
        hash.Add(p1);
        hash.Add(p2);

        var hash2 = new HashSet<Car>();
        hash2.Add(p3);
        hash2.Add(p4);

        var carComparer = new CarComparer();
        Assert.That(hash, Is.EquivalentTo(hash2).Using(carComparer));

I put breakpoints in .equals and .hashcode. Equals is used; but GetHashCode is not. Why?

w0051977
  • 15,099
  • 32
  • 152
  • 329
  • What is `Car` (and/or `car`, are they equivalent)? Where exactly did you expect `GetHashCode` to be called? Why? – Patrick Hofman Aug 18 '17 at 09:25
  • At what point was equals used? Putting your code into Linqpad doesn't call either method for me which is to be expected since you aren't telling the hashset to use your comparer. If I do `var hash = new HashSet(new carComparer());` then it calls `GetHashCode` twice, I assume once for each object that you add... – Chris Aug 18 '17 at 09:26
  • And why should it be used? You are not calling it anywhere. You are calling GetHashCode() ion car.id and car.description, but not on car. – VDN Aug 18 '17 at 09:30
  • @VDN, I am calling it in the Assert. – w0051977 Aug 18 '17 at 09:30

3 Answers3

4

You are comparing two HashSet using NUnit Is.EquivalentTo. There is no reason for it to call GetHashCode - it basically compares two collections for equality of its members. That's why GetHashCode is never called and Equals is called to compare two items from different HashSets for equality. Your hashsets could as well be lists or any other enumerable - that doesn't change anything when comparing two collections.

You might expect GetHashCode to be called when you add item to HashSet - but it's not so, because at this point your carComparer is not yet known - you don't pass it to HashSet constructor. If you will do it like this:

var hash = new HashSet<Car>(new carComparer());

Then GetHashCode would be called when you add new item to corresponding HashSet.

Evk
  • 98,527
  • 8
  • 141
  • 191
  • When is GetHashCode called? Is it called when an item is added to the collection or when the .equals is called. Sorry for the confusion. – w0051977 Aug 18 '17 at 09:33
  • Thanks. What do you think of the quality of my hashcode? I used the answer to this question as a template: https://stackoverflow.com/questions/1180044/should-one-override-equals-method-for-asserting-the-object-equality-in-a-unit-te +1 for the clarification, which makes sense. – w0051977 Aug 18 '17 at 09:39
  • Looks ok for me. Note that you might override `GetHashCode` and `Equals` of `car` object itself, then you won't need to have pass specific comparer every time. – Evk Aug 18 '17 at 09:42
  • it is used by a unit test only. Therefore I do not think it is a good idea to override Car. Thanks again. – w0051977 Aug 18 '17 at 09:46
  • 2
    It's quite strange that cars are considered equal by one condition in unit test and by other condition in production. But if that makes sense to you - why not? – Evk Aug 18 '17 at 09:47
  • Also is it "ok" to use the id (GUID) as part of a hashcode comparison? Notice that I do not use id in the .equals. All the examples I look at never use the id. Thanks again. – w0051977 Aug 18 '17 at 09:48
  • @w0051977: What you should be doing is working out what you consider equality to be first and then building up a hashcode from the fields in that equality. In your case for example your hash code is broken because two cars with the same description but different ids will be considered equal by your equality method but will generate different hashcodes. For your equality method your hashcode could just return `description.GetHashCode()`. However I assume that in fact different ids make different cars... – Chris Aug 18 '17 at 09:55
  • Also a note on your hashcode function - I'm not sure why you have the initial value of 17 and then do a multiply of it by 29. The only difference between what you have now and an initial seed of 0 is that your final hashcode will be 29*29*17 higher than if your initial seed was 0. This will be a constant increase so feels a little pointless to me. – Chris Aug 18 '17 at 09:57
  • As almost always with SO questions - you should post what problem are you solving, not just some random code. – Evk Aug 18 '17 at 10:04
  • @Chris, please see the answer from Jon Skeet in this question: https://stackoverflow.com/questions/263400/what-is-the-best-algorithm-for-an-overridden-system-object-gethashcode. It tells you why I have used 17 and 29. – w0051977 Aug 18 '17 at 10:04
  • @w0051977: Thanks. Will ask Jon. :) – Chris Aug 18 '17 at 10:08
  • @Chris he multiplies with addition, it is not the same as just 29*29*17 – Evk Aug 18 '17 at 10:10
  • @evk: It is... In the above if the two hashcodes are a,b and the initial value is i then the hashcode is: (29 * (i*29+a))+b = which equals i*29*29+a*29+b. Or am I screwing up the maths here? – Chris Aug 18 '17 at 10:13
  • I've decided to just ask a new question about it in case anybody is interested: https://stackoverflow.com/questions/45754687/why-is-an-initial-prime-used-in-gethashcode-implementations – Chris Aug 18 '17 at 10:42
2

GetHashCode is typically used in a hash-table-lookup.

The GetHashCode does not have to be guaranteed unique and therefor is not a valid IsEqual test.

For the GetHashCode to be used, use this constructor of the HashSet:

https://msdn.microsoft.com/en-us/library/bb359100(v=vs.110).aspx

So in order to use the GetHashCode method, you'll need to use:

var hash = new HashSet<Car>(carComparer);

Note that the hash will be verified when adding your object to the hashset.

The Comparer is then used in the HashSet.Add method, within this call:

private int InternalGetHashCode(T item)
{
  if ((object) item == null)
    return 0;

  //this._comparer is set during the constructor call
  return this._comparer.GetHashCode(item) & int.MaxValue;
}

For obvious reasons this makes the Comparer a read only property.

So, to sum up;

The GetHashCode is not used because, since it is typically used in a hash-table-lookup-creation, you'll need to provide it to the HashSet before you start adding items.

The IsEqual is used due to obvious reasons; and if it's not: see @dasblinkenlight's answer.

Stefan
  • 17,448
  • 11
  • 60
  • 79
  • 2
    You seem to be missing the point of the question. He is putting things in a `HashSet` which is why he expects `GetHashCode` to be called but it isn't. Your answer doesn't seem to address this point, instead talking generalities about the two methods... – Chris Aug 18 '17 at 09:27
  • But, the `IEqualityComparer` is not bound to the HashSet in any way while it is while testing equality. – Stefan Aug 18 '17 at 09:31
  • Right, carComparer is used in Add method only if it is passed in HashSet constructor. – cdel Aug 18 '17 at 09:39
  • @Chris: Although you had a point about missing a crucial part of the question, I don't think your point is valid since the hashset is not bound to the comparer which is crucial while adding to (and therefor creating) the hashtable. Anyhow, I updated the answer with more details, I hope you will reconsider your comment ;-) – Stefan Aug 18 '17 at 10:35
  • @Stefan: I do think the answer is much better now, yes. – Chris Aug 18 '17 at 10:39
1

This happens because of the algorithm used in IsEquivalent to decide equivalency: the implementation constructs what they call a "collection tally" object from the collection that you expect, and then try removing items of the actual collection from it one-by-one:

public bool TryRemove(IEnumerable c) {
    foreach (object o in c)
        if (!TryRemove(o))
            return false;
    return true;
}

public bool TryRemove(object o) {
    for (int index = 0; index < list.Count; index++)
        if (ItemsEqual(list[index], o)) {
            list.RemoveAt(index);
            return true;
        }
    return false;
}

You can see that NUnit uses a relatively inefficient O(n2) algorithm instead of constructing a hash set for O(n) efficiency. This would matter for larger sets, but since a typical collection in a unit test has only a few items, there would be no noticeable difference.

ItemsEqual uses Equals from the equality comparer, but it has no need for hash code functionality (source code).

Sergey Kalinichenko
  • 714,442
  • 84
  • 1,110
  • 1,523