From what I understand, some Java implementations implement identityHashCode
by reserving two bits the an object header to divide objects into three categories:
Those for which identityHashCode()
has never been called.
Those for which the first-ever call to identityHashCode()
occurred after the most recent time the object moved.
Those for which the first-ever call to identityHashCode()
occurred before the most recent time the object moved.
For an objects of the first category, a call to identityHashCode()
will return a value related to the object's address and set a flag to change the object into the second category. For an object in the second category, the identityHashCode()
will compute the same value from the address as the first call. Any time the GC is going to move an object, it checks whether it is in the second category above. If so, the GC reserves an extra four bytes at the destination which will be used to hold the identity hash value the object had before it moved.
If an object has its hash code taken, the effective size of the object will increase by four bytes the next time the GC has to move it. Most objects never have their identity hash code taken, however, and some of them have their identity hash code taken but are immediately collected. Having an object's first call to identityHashCode()
allocate an extra four bytes for an object would be difficult, but allocating an extra four bytes when the object is moved is not a problem. Using the object's address as its hash value between the first call to identityHashCode()
and the next time the object moves avoids the need to allocate the storage when the space following the object is in use.
Note that while an identity hash method could "legally" use the bit pattern of the address directly, doing so could cause excessive number of hash duplicates (e.g. the first object created following one GC cycle could easily have the same address as the first object created after another). A simple way to avoid that problem is to have the system either add, xor, or otherwise combine the address with a value which gets changed after every GC cycle. Such an approach would make it easy to spread hash values out over a wider range.