74

Typically the default implementation of Object.hashCode() is some function of the allocated address of the object in memory (though this is not mandated by the JLS). Given that the VM shunts objects about in memory, why does the value returned by System.identityHashCode() never change during the object's lifetime?

If it is a "one-shot" calculation (the object's hashCode is calculated once and stashed in the object header or something), then does that mean it is possible for two objects to have the same identityHashCode (if they happen to be first allocated at the same address in memory)?

oxbow_lakes
  • 133,303
  • 56
  • 317
  • 449
butterchicken
  • 13,583
  • 2
  • 33
  • 43
  • 1
    Related question: Is that memory address a real memory address or something virtual that can stay fixed even as the object gets shuffled about? If virtual, that would be nice because the pointers to it would not need to be adjusted. On the other hand, this would mean an extra indirection and a potentially big mapping table. – Thilo Jun 30 '09 at 11:06
  • 3
    It's a slight rearrangement of the address when first requested. (Returning a hash code with low bits all zero isn't great.) – Tom Hawtin - tackline Jun 30 '09 at 11:14
  • Actually, where does it say that the identityHashCode must never change? The JavaDoc for System.identityHashCode is not clear on that. – Thilo Jun 30 '09 at 11:32
  • Of course, if identityHashCode did change, you could only use objects that implements hashCode() as keys in hash tables. – Thilo Jun 30 '09 at 12:06
  • Thilo - it follows from the specification of hashCode and equals in Object. – Tom Hawtin - tackline Jun 30 '09 at 12:45
  • 3
    Okay, got it: "Whenever (hashCode) is invoked on the same object more than once during an execution of a Java application, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified." And equals in this case is object identity comparison. – Thilo Jul 01 '09 at 01:15

5 Answers5

43

Modern JVMs save the value in the object header. I believe the value is typically calculated only on first use in order to keep time spent in object allocation to a minimum (sometimes down to as low as a dozen cycles). The common Sun JVM can be compiled so that the identity hash code is always 1 for all objects.

Multiple objects can have the same identity hash code. That is the nature of hash codes.

Tom Hawtin - tackline
  • 145,806
  • 30
  • 211
  • 305
  • 6
    Right - I've just looked thru ObjectSynchronizer::FastHashCode in synchronizer.cpp (vm runtime source code) and after generating the hashcode, it looks like it merges it into the object header. Looks like there are several possible implementations of HashCode; the one you allude to that returns 1 for all objects is used to ensure no part of the VM assumes hashcodes are unique for any reason. – butterchicken Jun 30 '09 at 12:33
  • public static native int identityHashCode(Object x); is a native method. Are you able to explain it from native implemented code perspective? I mean C++ implementation.it is mainly used in inIdentityHashMap right? – Clark Bao Mar 09 '12 at 00:36
  • @Tom What do you mean by object header? You also wrote " I believe the value is typically calculated only on first use in order to keep object allocation to a minimum (sometimes down to as low as a dozen cycles)." Can you explain which object allocation you are referring to here? – Geek Aug 31 '13 at 16:39
  • 1
    @Geek I meant the execution time spent allocating an object is kept to a minimum (I have clarified the text). Every object (including arrays) in a typical Java implementation will start with some bytes indicating the runtime type, the monitor for intrinsic locking, possibly GC-related bits and the identity hash code. Actual details may be quite complicated because it needs to be heavily optimised. – Tom Hawtin - tackline Sep 01 '13 at 01:14
  • @Lil Identity and monitors on objects are rarely used, yet they are still always there. This severely hampers the JVM, but there you go. Where are you proposing the header be expanded to. Stop the machine and track down every incoming reference for every object so used? / You are right, in that typically a few bits short of four bytes will be used for the hash code. Some implementations may to peculiar things, such as copy out the hash onto the stack during synchronisation to make more room for nice contended lock behaviour. No need for the downvote, IMO. – Tom Hawtin - tackline Jun 12 '15 at 20:09
  • @TomHawtin-tackline: You seem to respond to a comment of mine which I have removed. I only vaguely remember this incident. I think I commented and downvoted based on a misunderstanding, then later discovered I was wrong. I came back and removed the comment, but the downvote was already locked. Sorry about that! – Lii Jul 11 '15 at 09:14
17

In answer to the second question, irrespective of the implementation, it is possible for multiple objects to have the same identityHashCode.

See bug 6321873 for a brief discussion on the wording in the javadoc, and a program to demonstrate non-uniqueness.

Stephen Denne
  • 36,219
  • 10
  • 45
  • 60
  • 1
    True. Two different objects can have the same hashCode. That is the case with all hash functions (over a domain bigger then their result size). – Thilo Jun 30 '09 at 11:11
  • 1
    @Thilo: The JVM could have been written in such fashion as to guarantee that, provided there are never more than four billion objects in existence at once, `identityHashCode` would never return a value which had been returned for with any other object which is still in existence. Depending upon how the memory manager is implemented, this could be expensive, or it might add zero additional cost. For example, an `Object` could contain an index into a table of pointers, with each object being immutably assigned a table slot for as long as it exists. Typical JVM implementations don't do that... – supercat Sep 07 '13 at 20:52
  • ...but some other "handle-based" memory-management schemes do, so it may be worthwhile to document that the JVM essentially picks an arbitrary number the first time an object is asked for its identity hash code, and then stores it for later use [btw, I don't recall ever reading anything to officially document whether `identityHashcode` is thread-safe. If an object's hash code has never been retrieved, is there any guarantee that simultaneous "first" calls to `identityHashCode` on that object will yield the same value? – supercat Sep 07 '13 at 20:56
2

The header of an object in HotSpot consists of a class pointer and a "mark" word.

The source code of the data structure for the mark word can be found the markOop.hpp file. In this file there is a comment describing memory layout of the mark word:

hash:25 ------------>| age:4 biased_lock:1 lock:2 (normal object)

Here we can see that the the identity hash code for normal Java objects on a 32 bit system is saved in the mark word and it is 25 bits long.

Lii
  • 11,553
  • 8
  • 64
  • 88
0

The general guideline for implementing a hashing function is :

  • the same object should return a consistent hashCode, it should not change with time or depend on any variable information (e.g. an algorithm seeded by a random number or values of mutable member fields
  • the hash function should have a good random distribution, and by that I mean if you consider the hashcode as buckets, 2 objects should map to different buckets (hashcodes) as far as possible. The possibility that 2 objects would have the same hashcode should be rare - although it can happen.
Gishu
  • 134,492
  • 47
  • 225
  • 308
-4

As far as I know, this is implemented to return the reference, that will never change in a objects lifetime .

Mnementh
  • 50,487
  • 48
  • 148
  • 202
  • So you are saying that the reference is not a real memory address (or directly derived from that). So is it a sort of a pointer to the real memory address? – Thilo Jun 30 '09 at 11:09