21

If you run the following on HotSpot Java 7 64-bit version.

int countTopBit = 0, countLowestBit = 0;
for (int i = 0; i < 100000000; i++) {
    int h = new Object().hashCode();
    if (h < 0)
        countTopBit++;
    if ((h & 1) == 1)
        countLowestBit++;
}
System.out.println("The count of negative hashCodes was " + countTopBit + ", the count of odd hashCodes was " + countLowestBit);

you can get a result like

The count of negative hashCodes was 0, the count of odd hashCodes was 49994232

I was wondering if this means the Object.hashCode() is only really 31-bit and why this might be so?


It is not the case that the top bit is not used. From the source for HashMap

257   /**
258    * Applies a supplemental hash function to a given hashCode, which
259    * defends against poor quality hash functions.  This is critical
260    * because HashMap uses power-of-two length hash tables, that
261    * otherwise encounter collisions for hashCodes that do not differ
262    * in lower bits. Note: Null keys always map to hash 0, thus index 0.
263    */
264   static int hash(int h) {
265       // This function ensures that hashCodes that differ only by
266       // constant multiples at each bit position have a bounded
267       // number of collisions (approximately 8 at default load factor).
268       h ^= (h >>> 20) ^ (h >>> 12);
269       return h ^ (h >>> 7) ^ (h >>> 4);
270   }
Peter Lawrey
  • 525,659
  • 79
  • 751
  • 1,130
  • This might be as an optimization. Hash tables often take the hash code mod the number of buckets, but have to mask off the top bit first to avoid a negative index. Perhaps this is to avoid that step later? – templatetypedef Jan 21 '13 at 09:22
  • @templatetypedef I did wonder this might be the case, but I suspect Object.hashCode() is not used in Hash collections that often as it is usually overriden with a 32-bit hashCode(). Also HashMap mutates the raw hashCode so the top bit is used. – Peter Lawrey Jan 21 '13 at 09:24
  • 2
    @templatetypedef: but what about user overrides that don't do this? Hash table still needs to deal with this – Oliver Charlesworth Jan 21 '13 at 09:24
  • 2
    Same code for C# gives: "The count of negative hashCodes was 0, the count of odd hashCodes was 50000002", so the question might be valid for .net as well :) – SWeko Jan 21 '13 at 09:26
  • I agree that this doesn't always work because of overriding. However, it might be possidle for the VM internals to take advantage of it. – templatetypedef Jan 21 '13 at 09:26
  • 4
    Did you try this on 64-bit or 32-bit process? From [`Object`](http://docs.oracle.com/javase/1.4.2/docs/api/java/lang/Object.html#hashCode()): "This is typically implemented by converting the internal address of the object into an integer, but..." - if all of your objects are allocated in the lowest 2GB of address space, than this implementation would never return a value with the high bit set. (And especially since each of these objects becomes instant garbage, you're likely always allocating in the same region of memory) – Damien_The_Unbeliever Jan 21 '13 at 09:27
  • @templatetypedef I have updates my question to show that the top bit is used. – Peter Lawrey Jan 21 '13 at 09:29
  • @Damien_The_Unbeliever The actual address is not used directly. e.g. all objects are on an 8-byte boundary so the lowest 3 bits are also 0 in the address, but not the in the hashCode. See an earlier question which show that objects with the same address don't get the same hashCode. http://stackoverflow.com/questions/13860194/what-is-an-internal-address-in-java – Peter Lawrey Jan 21 '13 at 09:30

1 Answers1

14

HotSpot supports a variety of hashing algorithms for Object. As you've discovered emprically, the top bit is always masked out before the result is returned:

// src/share/vm/runtime/synchronizer.cpp
static inline intptr_t get_next_hash(Thread * Self, oop obj) {
   ...
   value &= markOopDesc::hash_mask;
   ...
   return value;
}

markOopDesc::hash_mask is computed as follows:

  enum { age_bits                 = 4,
         lock_bits                = 2,
         biased_lock_bits         = 1,
         max_hash_bits            = BitsPerWord - age_bits - lock_bits - biased_lock_bits,
         hash_bits                = max_hash_bits > 31 ? 31 : max_hash_bits,
         ...
         hash_mask               = right_n_bits(hash_bits),

As you can see, markOopDesc::hash_mask always has bit 31 set to zero.

As to why this is done, your guess is as good as mine. It could have been that the original developer felt that only dealing with positive integers would simplify things down the line. For all we know, it could even be an off-by-one error in the hash_bits computation. ;-)

NPE
  • 486,780
  • 108
  • 951
  • 1,012
  • +1 As SWeko found, C# does the same thing which suggests there might have been a common reason. Or it could be they copied a common implementation which did this for reasons which no longer apply. – Peter Lawrey Jan 21 '13 at 09:47
  • It could be possible that `right_n_bits` does not take 32 as an argument correctly. I am building Java 8 to so I have a copy of the source to refer to. ;) – Peter Lawrey Jan 21 '13 at 09:51