2

I have the following class:

public class Foo
{
    int year;       
    string name;    
    int category;   
}

Here is some example data:

2012    Test1   1000
2012    Test2   1000
2012    Test3   1000    
2012    Test4   1000
2012    Test4   10
...

If I override GetHashCode all results are very similiar:

return year ^ name ^ category;

int hash = 13;
    hash = hash * 33 + year.GetHashCode();
    hash = hash * 33 + name.GetHashCode();
    hash = hash * 33 + category.GetHashCode();
    return hash; 

What is a good hash function (with maximal distribution) for this situation?

Edit: Perhaps my understanding of the hash buckets is wrong. Go similar hash values to the same bucket?

"Test1".GetHashCode() --> -1556460260
"Test2".GetHashCode() --> -1556460257
Dave Clemmer
  • 3,741
  • 12
  • 49
  • 72
LuckyStrike
  • 1,483
  • 3
  • 12
  • 23
  • 2
    While that implementation of `GetHashCode` might be improvable, I would say it is OK for the hash codes to be similar, because the objects are similar, too! You certainly can't deduct a bad distribution from this – Daniel Hilgarth Aug 08 '13 at 13:24
  • Why 33? I think the common choice is a prime number (which 33 is not), though I can't tell you exactly why. – Bernhard Barker Aug 08 '13 at 13:25
  • Why are you concerned with the results looking similar? What are you really trying to accomplish? – Nick Zimmerman Aug 08 '13 at 13:25
  • Have you tried contatenating all elements as strings, then using GetHashCode() on the result? – Dave R. Aug 08 '13 at 13:26
  • 4
    @DaveR. that is a really inefficient approach to hashing – Marc Gravell Aug 08 '13 at 13:27
  • 1
    The C# compiler uses (for anonymous types) a seed of `2018130517` and a multiplier of `-1521134295` - just saying... – Marc Gravell Aug 08 '13 at 13:29
  • Respect the Skeet: http://stackoverflow.com/a/263416/3312 – Jesse C. Slicer Aug 08 '13 at 15:02
  • Pick any pair of constants from [this table](http://en.wikipedia.org/wiki/Linear_congruential_generator#Parameters_in_common_use) – Hans Passant Aug 08 '13 at 15:29
  • Hashing should result in similar hashed values for two similar values. The difference between "Test1" and "Test2" is very small, therefore the hash is similar. The purpose of a hash is not to obscure the data, but instead a hash allows verification of data integrity. If you want to obscure the data, then use encryption, not hashing. Read this: http://danielmiessler.com/study/encoding_encryption_hashing/ – Nick Zimmerman Aug 08 '13 at 15:59

1 Answers1

3

One of the things I would recommend is to check if the String object is null or not.

The implementation seems to be fine, it would be similar but the hashcodes should be different as the main objective is to make them land in different buckets, hence helping out for further operations.

   public int hashCode() {    // Assuming year and category are String like name.
    int hash = 31;
    hash = hash * 331 + (this.year != null ? this.year.GethashCode() : 0);
    hash = hash * 331 + (this.name != null ? this.name.GethashCode() : 0);
    hash = hash * 331 + (this.category != null ? this.category.GethashCode() : 0);

    return hash;
}

The few steps that I have learnt while Overriding hashCode are;

  1. Choose a prime hash e.g. 5, 7, 17 or 31 (prime number as hash, results in distinct hashcode for distinct object).
  2. Take another prime as multiplier different than hash is good.
  3. Compute hashcode for each member and add them into final hash. Repeat this for all members which participated in equals.
  4. Return hash.
JNL
  • 4,683
  • 18
  • 29