Does this hashCode make sense?

Question

I came across some an hashcode function that does kind of the following:

class MyClass{
   private String string;
   //..other data members and methods...
   public int hashCode()
   { 
        int result = 17;
        if(string != null)
        {
           result = result*31 + string.hashCode;
        }
        return result;
   }
};

I am not fully convinced of the method used to calculate the hashCode, I know use of prime numbers yields a better distribution in general. But in this implementation I am not really convinced that's the case.

For example assuming a standard hash implementation I would miss all the buckets between 0 and 17*31.

Is there maybe some subtlety that I am not seeing?

You didn't ask about Eclipse, but since this is the default implementation that Eclipse generates, I think the linked question and answer probably solves the question of where it comes from and how it's supposed to work. If this doesn't solve your question, let me know and I can reopen. — Jeff Bowman, Feb 17 '22 at 16:19
It doesn't because it doesn't seem to explain the rationale behind it. From what I can tell it simply shifts by 17*31 the hashCode of the string, which doesn't change the distribution at all, at most in my mind makes it worse since we're missing a lot of buckets and I can't explain the reason why it would be a good idea. — user8469759, Feb 17 '22 at 16:21
I bet that when placing a value to buckets (e.g. in a HashMap) according to their hashcode, the system selects the bucket with modulo, so `bucket = hashcode % numOfBuckets`. So you don't miss all the buckets from 0 to 17*31. Having said that, this auto-generated code would make sense if there were more fields in the class. For a single field, you might as well return the hashcode of that field. — Nikos Paraskevopoulos, Feb 17 '22 at 16:29
So the hashCode in java is the integer before the modulus applied then? In my intuition the hashCode would be an index for a bucket which would've included the modulus already applied, — user8469759, Feb 17 '22 at 16:39
@user8469759 `hashCode` is necessarily independent of the use case; a hash-compatible object produces a single `int`. Hash tables can be of any size, and will vary in size depending on population count and amortization cost. There's no chance for the hash table size to influence the return of `hashCode`, so it only makes sense that a modulus or bitmask would be applied later by the hash table implementation itself. — Jeff Bowman, Feb 17 '22 at 16:42
No one should be writing their own hash code algorithms in Java. I know we all did it in CS courses, but if an object (like in your example) is basing its hash code a single value, just calling that object’s `hashCode()` is sufficient. If it might be null, use [Objects.hashCode](https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/util/Objects.html#hashCode(java.lang.Object)). If the object's hash code is based on more than one value, use [Objects.hash](https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/util/Objects.html#hash(java.lang.Object...)). — VGR, Feb 17 '22 at 23:49
Thanks for all the insights, as I said it's a code I've found and I am not a Java expert. I would appreciate if people would just answer the question (which has been done) rather than saying "no one should do this". Comments like those are simply pointless and make feel people stupid for asking the question in the first place. — user8469759, Feb 18 '22 at 09:18

score 2 · Answer 1 · answered Feb 17 '22 at 16:32

2

You're missing the overflow case, which is actively likely in hashCode implementations.

In particular, string.hashCode can be negative.

answered Feb 17 '22 at 16:32

Louis Wasserman

191,574
25
345
413

Why would it be negative? I don't see that from the formula: https://stackoverflow.com/questions/15518418/whats-behind-the-hashcode-method-for-string-in-java – user8469759 Feb 17 '22 at 16:38
1

Because as soon as the result is bigger than 2^31-1, it wraps around to negative values, because that's how `int` works. – Louis Wasserman Feb 17 '22 at 16:44
And how is the 17*31 worked out? What's the worst case offset you can have and why? – user8469759 Feb 17 '22 at 16:49
All choices of offset are equivalent in general. – Louis Wasserman Feb 17 '22 at 18:25

score 2 · Accepted Answer · answered Feb 17 '22 at 16:39

As in the question Is the hashCode function generated by Eclipse any good? (originally duped against this answer, reopened by request), this hashCode function matches implementations built into Java and recommended by Java co-author Joshua Bloch in Effective Java Item 9. This is similar to the Annotation docs, which prescribe a hash function that is the sum of (member value hash code) xor (127 * member name hash code) for all members. By picking prime numbers to start out with--here, 17 and 31--the hash factors would be necessarily coprime.

As in the Objects.hashCode documentation, the important things are that the hashCode is consistent between runs, consistent with equals, and distinct if practical.

One major factor about hash code design is that hash codes will wrap around. As in the OpenJDK8 code for HashMap:

Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
if ((tab = table) != null && (n = tab.length) > 0 &&
    (first = tab[(n - 1) & hash]) != null) {

The table length, necessarily a power of two, becomes a mask for the hashCode: For a hash table of size 64 the hash gets a bit mask of 63, 0b00111111. Given the prime number "hash smear", these low bits will be well-distributed, no more or less than if the 17 and 31 factors were there for a single-field hash function, but of particular advantage if there were two, three, or fifty fields all being combined into a single hash function. The absolute magnitude of the returned hashCode doesn't matter, as long as the appropriate low bits of the hash codes are well-distributed.

Does this hashCode make sense?

2 Answers2