1

I am grouping log records by a RegEx pattern. After grouping them I'd like to get a Distinct count of the records for each group. For this example, Distinct is defined as the same visit key and the same year, month, day, hour, and minute.

It's just a way of getting a more accurate count of something getting logged all the way up the stack by different consumers.

Alright, so I'm grouping them like this:

var knownMessages = logRecords
    .Where(record => !string.IsNullOrEmpty(record.InclusionPattern))
    .GroupBy(record => new
    {
        MessagePattern = record.InclusionPattern
    })
    .Select(g => new KnownMessage
    {
        MessagePattern = g.Key.MessagePattern,
---->   Count = g.Distinct().Count(),
        Records = g.ToList()
    })
    .OrderByDescending(o => o.Count);

And GetHashCode for the type is implemented like this:

public override int GetHashCode()
{
    var visitKeyHash = this.VisitKey == null ?
        251 : this.VisitKey.GetHashCode();
    var timeHash = this.Time.Year + this.Time.Month + this.Time.Day +
        this.Time.Hour + this.Time.Minute;

    return ((visitKeyHash * 251) + timeHash) * 251;
}

But, for example, in the list I have three records that return the same hash code 1439926797; I still get a count of 3. I know it's leveraging GetHashCode (as I expected) to do the comparison because I have a breakpoint there to see what the hash code is.

What did I miss?

Mike Perrenoud
  • 66,820
  • 29
  • 157
  • 232
  • First off, that's not a very good way of combining hash values together in a manor that won't cause collisions. Second, you haven't shown the definition of `Equals`, which is of course essential to the definition of equality. – Servy Apr 18 '14 at 16:58
  • @Servy, I thought if the hash codes matched they would be considered equal. – Mike Perrenoud Apr 18 '14 at 16:59
  • You should implement IEquitable – Vasily Sliounaiev Apr 18 '14 at 16:59
  • @MichaelPerrenoud No. The `Equals` method is used to resolve hash collisions. – Servy Apr 18 '14 at 16:59
  • @Servy, okay so really I want to implement this code in the `Equals` method instead -- is that what I'm understanding? – Mike Perrenoud Apr 18 '14 at 17:00
  • 1
    @MichaelPerrenoud No. It's important to always override both or neither, never just one, and whenever overriding both they should both use the same general definition of "equality", else Bad Things will happen. – Servy Apr 18 '14 at 17:00
  • 3
    @MichaelPerrenoud The logic is : If `a.GetHashcode() != b.GetHashCode()` then `a != b`, If `a.GetHashCode() == b.GetHashCode() && a.Equals(b)` then `a == b`, All `GetHashcode()` does for you is lets you skip the `Equals()` check if you have two different values. That is why you need to implement both, If you only implement `Equals()` then the `a.GetHashCode() == b.GetHashCode()` step fails and it never tries the `Equals()` you implemented. – Scott Chamberlain Apr 18 '14 at 17:01
  • Related, [This may help](http://stackoverflow.com/a/22453193/2530848) – Sriram Sakthivel Apr 18 '14 at 17:05

3 Answers3

2

It seems you have not overridden the Equals method to use the same definition of equality as your hash code generation algorithm. Since that is used to resolve hash collisions, it is important that the two always be in sync.

Servy
  • 202,030
  • 26
  • 332
  • 449
  • Great! So I should just call `GetHashCode` from the `Equals` *as long as* the types are in fact the same; right? – Mike Perrenoud Apr 18 '14 at 17:01
  • @MichaelPerrenoud No, because hash codes can collide. Equal hash codes do not mean equal objects, unless there are less than 2^32 possible values for your type (in which case an effective hash means there are no hash collisions, and you're fine). – Servy Apr 18 '14 at 17:02
  • Okay, so I've written `GetHashCode` to cause like object to collide though. Further, based on @Scott's comment that is necessary to force `Equals` to even be called. So why would I not just call `GetHashCode` from the `Equals` method? Of course assuming they are the very same type. – Mike Perrenoud Apr 18 '14 at 17:04
  • 1
    @MichaelPerrenoud Again, because it's possible for different objects to have the same hash code. All equal objects must have the same has code, not all unequal objects have unequal hash codes. There are only 2^32 possible ints, but there are many more possible values of most types out there, including yours. – Servy Apr 18 '14 at 17:05
  • 1
    You can't call `GetHashCode` from `Equals`, barring the very few cases were all possible values have different hash codes, and even then that's likely a poor implementation. Your `Equals` does a comparison to see if they are equal in whatever way you have said means equals for the class in question (most often a set of equality comparisons across different fields). Your `GetHashCode` produces a number based on the same criteria. Hence if `x==y` means `x.GetHashCode() == y.GetHashCode()`, but `x.GetHashCode() == y.GetHashCode()` does not guarantee `x==y`. – Jon Hanna Apr 18 '14 at 17:05
2

You don't give your Equals override. As with other hash-based collections like Dictionary and HashSet, the internal structure used by Distinct() uses GetHashCode() to select a hash to store by, but Equals to determine actual equality.

The problem could be either a bug in your Equals or in your GetHashCode, but in the later case is that it doesn't correctly match your Equals (GetHashCode must return the same hash for two objects for which Equals returns true, but can of course also return the same for two different objects), which makes it a bug in the pair of methods. So either way, the problem is directly or indirectly in your override of Equals.

Jon Hanna
  • 110,372
  • 10
  • 146
  • 251
2

First let me repeat what I said in my comment.

The logic is : If a.GetHashcode() != b.GetHashCode() then a != b, If a.GetHashCode() == b.GetHashCode() && a.Equals(b) then a == b, All GetHashcode() does for you is lets you skip the Equals() check if you have two different values. That is why you need to implement both, If you only implement Equals() then the a.GetHashCode() == b.GetHashCode() step fails and it never tries the Equals() you implemented.

GetHashCode() should be fast and it's value should not change while it sits in a collection that depends on it's value. So don't modify VisitKey nor Time if you are storing these inside a Dictionary or HashSet or similar.

So all you need to do is:

public override int GetHashCode()
{
    var visitKeyHash = this.VisitKey == null ?
        251 : this.VisitKey.GetHashCode();
    var timeHash = this.Time.Year + this.Time.Month + this.Time.Day +
        this.Time.Hour + this.Time.Minute;

    return ((visitKeyHash * 251) + timeHash);
}

public override bool Equals(object obj)
{
    //Two quick tests before we start doing all the math.        
    if(Object.ReferenceEquals(this, obj))
        return true;

    KnownMessage message = obj as KnownMessage;
    if(Object.ReferenceEquals(message, null)))
        return false;

    return this.VisitKey.Equals(message.VisitKey) &&
           this.time.Year.Equals(message.Time.Year) &&
           this.time.Month.Equals(message.Time.Month) &&
           this.time.Day.Equals(message.Time.Day) &&
           this.time.Hour.Equals(message.Time.Hour) &&
           this.time.Minute.Equals(message.Time.Minute);
}
Scott Chamberlain
  • 124,994
  • 33
  • 282
  • 431
  • +1. I'd change the order `Object.ReferenceEquals` to first then casting with `as`. and also I'd use `if(Object.ReferenceEquals(message, null))` rather than `if(message == null)` because `==` operator can be overloaded, if it calls `Equals` again will result in `StackOverFlowException`. This is very common mistake :) – Sriram Sakthivel Apr 18 '14 at 17:15