1

Is it a good idea to generate a random number in the constructor and return this value from hashCode method? There is a chance for collisions, but this applies when writing your own hashCode method. So what are the drawbacks? When using this object in a HashMap it will be stored with the random number as a hash and then retrieved by the same. If there are collisions they will be resolved by equals.

Vallerious
  • 570
  • 6
  • 13
  • 2
    why do you want to do this? i.e. as opposed to using the default implementation from `Object` which has the right invariants – Sam Mason Oct 21 '20 at 15:07

3 Answers3

7

The hashCode contract specifies, among other things, that

If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.

So no, making it random is a bad idea.

Federico klez Culloca
  • 26,308
  • 17
  • 56
  • 95
  • 1
    Things can go wrong if the contract isn't followed as specified: https://stackoverflow.com/a/62477121/6099347 – Aniket Sahrawat Oct 21 '20 at 14:43
  • If we leave the 'contract' behind and disregard it as a bad practice, what are the practical implications? You can still implement the `equals` properly, but when using a HashMap you care mostly about the `hashCode` right? If there are less collisions then the map would work in an optimised way. – Vallerious Oct 22 '20 at 12:51
  • I frankly don't know. But the point is that you shouldn't care (and you should care about contracts, as these are assumptions other code will make about your code). If you *do* care the best thing you could do is try it yourself and see if you get better performance or if it breaks the whole concept. – Federico klez Culloca Oct 22 '20 at 13:33
  • @Vallerious Not following the contracts brings bugs in the system because people reading your code don't always have enough time to go through every line. If the system has 1m lines of code, nobody will read whole but make an assumption that the method names do what they claim to do and all contracts are followed. Suppose someone changed the concrete implementation for eg without going through the whole class which has 10k lines of code, in that case, the person will introduce bugs. The point being, following the contract is always better, don't worry about squeezing out the last CPU cycle. – Aniket Sahrawat Oct 23 '20 at 09:52
  • @Vallerious When using a HashMap you care about `equals` as much as you care about `hashCode`. The algorithm for retrieving an object is to get all that have the same hash, then iterate them and return the one that is equal. – Lubo Kanev Feb 15 '22 at 10:56
1

Basic answer

In the typical use case, you want to retrieve not with the same (in terms of identity) key object that was used to insert it in the hashing collection, but with an object which is logically equivalent (in terms of equal).

Thus, you want the following code to find the element in the map

Key k1 = new Key("param1", "param2");
Key k2 = new Key("param1", "param2");
Value val1 = new Value();

map.put(k1, value);

Value val2 = map.get(k2);

If hashCode is based solely on random number, than k1 and k2 have different hashes, and thus may point you to different buckets of the HashMap.

We showed that for objects implementing equals, random hashCode is a poor idea. This is the reasoning behind the part of hash code contract:

If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.

A bit more advanced part:

Let's take a look at Object.hashCode implementation. Note that the identity hashCode() implementation is dependant on the JVM.

Looking at JDK sources, get_next_hash offers six methods to calculate hashCode:

0. A randomly generated number.
1. A function of memory address of the object.
2. A hardcoded 1 (used for sensitivity testing.)
3. A sequence.
4. The memory address of the object, cast to int.
5. Thread state combined with xorshift (https://en.wikipedia.org/wiki/Xorshift)

Code for reference:

static inline intptr_t get_next_hash(Thread * Self, oop obj) {
  intptr_t value = 0 ;
  if (hashCode == 0) {
     // This form uses an unguarded global Park-Miller RNG,
     // so it's possible for two threads to race and generate the same RNG.
     // On MP system we'll have lots of RW access to a global, so the
     // mechanism induces lots of coherency traffic.
     value = os::random() ;
  } else
  if (hashCode == 1) {
     // This variation has the property of being stable (idempotent)
     // between STW operations.  This can be useful in some of the 1-0
     // synchronization schemes.
     intptr_t addrBits = cast_from_oop<intptr_t>(obj) >> 3 ;
     value = addrBits ^ (addrBits >> 5) ^ GVars.stwRandom ;
  } else
  if (hashCode == 2) {
     value = 1 ;            // for sensitivity testing
  } else
  if (hashCode == 3) {
     value = ++GVars.hcSequence ;
  } else
  if (hashCode == 4) {
     value = cast_from_oop<intptr_t>(obj) ;
  } else {
     // Marsaglia's xor-shift scheme with thread-specific state
     // This is probably the best overall implementation -- we'll
     // likely make this the default in future releases.
     unsigned t = Self->_hashStateX ;
     t ^= (t << 11) ;
     Self->_hashStateX = Self->_hashStateY ;
     Self->_hashStateY = Self->_hashStateZ ;
     Self->_hashStateZ = Self->_hashStateW ;
     unsigned v = Self->_hashStateW ;
     v = (v ^ (v >> 19)) ^ (t ^ (t >> 8)) ;
     Self->_hashStateW = v ;
     value = v ;
  }

  value &= markOopDesc::hash_mask;
  if (value == 0) value = 0xBAD ;
  assert (value != markOopDesc::no_hash, "invariant") ;
  TEVENT (hashCode: GENERATE) ;
  return value;
}

Both OpenJDK 7 and OpenJDK 6 use the first method, a random number generator.

OpenJDK 8 changed the default to 5 - Thread state combined with xorshift.

Note that all of these implementations of Object.hashCode are consistent with Object.equals (in terms of hash code contract)

To sum up, you would be implementing one of the strategies behind Object.hashCode, but taking a risk of breaking the contract if you implement equals.

Lesiak
  • 22,088
  • 2
  • 41
  • 65
  • `To sum up, you would be implementing one of the strategies behind Object.hashCode, but taking a risk of breaking the contract if you implement equals.` - very nice summary – ankitkpd Jul 27 '23 at 14:25
1

If you do nothing and use Object.hashCode() (the "memory address" of the object), then you more or less have what you want to achieve. Hence you can have a HashMap/HashSet key of any class.

Still have a safe equals.

Joop Eggen
  • 107,315
  • 7
  • 83
  • 138