3

It is often claimed that the implementation of Object.hashCode() (the default implementation for all objects) gives the memory address of the object. That claim is often attached to an explanation of the peculiar output produced by Object.to String().

See here for an example.

This is certainly not the case for any JVMs/JREs I am aware of. Not least because addresses are usually 64 bits long now. But also, garbage collectors relocate objects, so the address changes. I've seen claims it can be the initial memory address of the object. But as many objects would then have similar addresses, that would be a poor choice for a hash code.

Are there, or have there ever been, any widely used JVMs/JREs for which it was the (initial) memory address of the object.

I am aware that the JavaDoc for the Object class suggests that the hashCode for an implementation might be the memory address. But I suspect that is a grossly out of date statement that has never been updated.

Indeed, the current Oracle JVM does not use the memory address (but can be configured to do so):

https://stackoverflow.com/a/16105878/545127

The idea that the hashCode is a memory address is a historical artefact:

https://stackoverflow.com/a/13860488/545127

My question is whether (and which) any widely used JVM used the memory address as its (default) implementation.

Community
  • 1
  • 1
Raedwald
  • 46,613
  • 43
  • 151
  • 237
  • 2
    In HotSpot it is configurable: https://stackoverflow.com/questions/16105420/java-object-hashcode-address-or-random/16105878#16105878 – Boann Mar 26 '16 at 15:16
  • Presumably the Sun JVM at the time the javadoc was amended to include that sentence? (Do you mean to ask whether there are any *current* JVMs that do this?) – meriton Mar 26 '16 at 15:46
  • See also http://stackoverflow.com/questions/13860194/what-is-an-internal-address-in-java – Raedwald Mar 26 '16 at 15:57

1 Answers1

2

Since the default hash code of an object does not need to be unique, returning the whole address is not necessary. An implementation could grab a group of bits from the address - say, bits 3 through 35 on a 64-bit system, or a XOR between the upper 32 bits and the lower 32 bits, or simply the lower 32 bits.

But as many objects would then have similar addresses [due to garbage collection], that would be a poor choice for a hash code.

Hash codes that are numerically close to each other are OK. Even a small number of identical hash codes would not create a problem, because equality is used to resolve any ties. The situations when the default hash code implementation is used are generally limited, because objects that are used as keys in hash-based containers are expected to provide "good" implementations of hashCode method.

Oracle says that the default implementation of their JVM uses the internal address of the object, whatever that means, to compute its hashCode. However, other JVM implementations are not required to do the same:

Here is a quote from Oracle's documentation:

As much as is reasonably practical, the hashCode method defined by class Object does return distinct integers for distinct objects. (This is typically implemented by converting the internal address of the object into an integer, but this implementation technique is not required by the Java™ programming language.)

You can find the actual implementation of the algorithm here. Search for get_next_hash function for details. It appears that computing hash based on address is done with a simple conversion:

value = intptr_t(obj) ;
Sergey Kalinichenko
  • 714,442
  • 84
  • 1,110
  • 1,523
  • Is that statement about the internal address *actually true*. I've seen (sorry no reference) an article that said otherwise. – Raedwald Mar 26 '16 at 15:09
  • @Raedwald That's a copy-paste [straight from the horse's mouth](http://docs.oracle.com/javase/7/docs/api/java/lang/Object.html#hashCode%28%29) – Sergey Kalinichenko Mar 26 '16 at 15:11
  • The lower 32 bits is a poor choice because object addresses will be aligned on word boundaries. – Raedwald Mar 26 '16 at 15:12
  • I know what the JavaDoc says. I have reason to believe it is simply not true. Presumably by being grossly out of date but never changed. – Raedwald Mar 26 '16 at 15:14
  • @Raedwald That's why I mentioned the possibility of shifting by 3 bits (bits 3 through 35). But here's the main thing: they don't care about their implementation to be super-good. They do it because they are nice to developers who forget to implement their hash code properly, not because they want their implementation to be any good. As long as it is somewhat useable, it's OK with them. – Sergey Kalinichenko Mar 26 '16 at 15:15
  • Evidence that the Oracle JVM does not use memory address: https://stackoverflow.com/a/16105878/545127 – Raedwald Mar 26 '16 at 15:20
  • @Raedwald Evidence? "4 – Object address"? – Sergey Kalinichenko Mar 26 '16 at 15:21
  • Guys, the source is readily available, you can just check for this implementation. In the general case it can be anything that does not change over the lifetime of an Object. – Tassos Bassoukos Mar 26 '16 at 16:21
  • @TassosBassoukos Right - that's the link I pasted on my last edit. – Sergey Kalinichenko Mar 26 '16 at 16:28
  • 1
    @TassosBassoukos: You are aware that there is more than one implementation of the JVM? cf. https://en.wikipedia.org/wiki/List_of_Java_virtual_machines – meriton Mar 26 '16 at 16:29
  • @meriton OP is asking for "any widely used JVM", so Oracle's implementation fits this description. – Sergey Kalinichenko Mar 26 '16 at 16:31
  • 1
    But is it the only one, i.e. are there no other "widely used JVM"? (I don't know, I just meant to point out that simply checking the source code of one JVM might not be enough to conclusively answer OP's question) – meriton Mar 26 '16 at 16:38
  • [the latest docs for Object](https://docs.oracle.com/en/java/javase/13/docs/api/java.base/java/lang/Object.html#hashCode()) say nothing about memory address and hash code. The quote referenced in the above answer is outdated and should not be taken seriously. – Kein Feb 11 '20 at 00:31