14

I'm looking for the algorithm of Object.hashCode().

This code is native in Object.java.

Is this because

(a) the code is in assembly-- never was in Java or any other HLL at all

or

(b) it simply isn't disclosed

?

In either case, I am looking to get hold of the algorithm (pseudo-code or some detailed explanation) of "how hashCode() is calculated"-- what are the params going into its calculation and the calculation itself?

Please note: It's the hashCode() of Object i'm looking for-- not another like that of String or hashMap/table.

//==========================================================================

the new Java docs-- jdk 8 now saying

"The value returned by hashCode() is the object's hash code, which is the object's memory address in hexadecimal." 
Roam
  • 4,831
  • 9
  • 43
  • 72

4 Answers4

12

Native hashCode method implementation depends on the JVM. By default in HotSpot it returns random number, you can check it in the source code (function get_next_hash)

nkukhar
  • 1,975
  • 2
  • 18
  • 37
9

Despite the Javadoc, the algo only may use the address as an input. This means that even though new objects use the same address in eden space they won't have the same hashCode.

There is a number of algos it might be using and not all use the address.

Note: the hashCode() is 31-bit.

BTW You can set it with Unsafe.putInt(object, 1, value)on Hotspot.

Set<Integer> ints = new LinkedHashSet<>();
int negative = 0, nonneg = 0;
for (int i = 0; i < 100; i++) {
    System.gc();
    for (int j = 0; j < 100; j++) {
        int h = new Object().hashCode();
        ints.add(h);
        if (h < 0) negative++;
        else nonneg++;
    }
}
System.out.println("unique: " + ints.size() + " negative: " + negative + " non-neg: " + nonneg);

prints

unique: 10000 negative: 0 non-neg: 10000

Using Unsafe

Field theUnsafe = Unsafe.class.getDeclaredField("theUnsafe");
theUnsafe.setAccessible(true);
Unsafe unsafe = (Unsafe) theUnsafe.get(null);

Object o = new Object();
System.out.println("From header " + Integer.toHexString(unsafe.getInt(o, 1L)));
// sets the hashCode lazily
System.out.println("o.hashCode()  " + Integer.toHexString(o.hashCode()));
// it's here now.
System.out.println("after hashCode() From header " + Integer.toHexString(unsafe.getInt(o, 1L)));
unsafe.putInt(o, 1L, 0x12345678);
System.out.println("after change o.hashCode()  " + Integer.toHexString(o.hashCode()));

prints

From header 0
o.hashCode()  2260e277
after hashCode() From header 2260e277
after change o.hashCode()  12345678
Peter Lawrey
  • 525,659
  • 79
  • 751
  • 1,130
  • 1
    "hashCode() using only the address of the object" would be consistent w/the specs at http://www.docjar.com/docs/api/java/lang/Object.html#hashCode. but then youre saying "even though new objects use the same address, they won't have the same hashCode". how then is this tie broken? some other memory location than the object's reference address (like that of some member, the ending address/amount of memory it uses, some stuff on the timestamp...?) – Roam Jul 31 '13 at 18:15
  • although meets "same hashCode for same object at every call of hashCode() within one execution", solely the address of the object for this calculation doesn`t feel so right-- too consistent with memory space, not that strong on randomness of hashcode. – Roam Jul 31 '13 at 18:18
  • 1
    it would be an immutable "handle" identifier that would remain constant for the lifetime of the object. Most likely VM implementation dependent and will likely not have any relationship to the memory address, because due to GC operations all objects in java are relocatable in physical memory. – peterk Aug 28 '14 at 16:33
  • @peterk I believe in a very early version, objects could not be moved and indeed the address was part of the hash. (before Java 1.2) – Peter Lawrey Aug 30 '14 at 07:06
  • 1
    likely in the past as it was easy to implement, but surely not the case today or proper. The only thing appropriate to assume is that it is invariant during the lifespan of the object and unique amongst all objects in the VM instance. – peterk Sep 01 '14 at 03:39
  • The hashCode is not unique among the instances in the VM. See comments on [this answer to a different question](http://stackoverflow.com/a/2427655/1048186) – Josiah Yoder Oct 14 '15 at 19:10
  • @JosiahYoder You only need about 64K objects for it to be likely that two of them to have the same randomly generated hashCode. – Peter Lawrey Oct 15 '15 at 20:23
  • Yes, I found in [my experiments](http://stackoverflow.com/questions/2427631/how-is-hashcode-calculated-in-java/2427655#2427655) it took on average about 80,000 newly created objects to get a collision. This is a purely experimental result. I don't know the theory, but I'm surprised a 32-bit number can only index 64k objects. Are only 16 bits actually used? – Josiah Yoder Dec 11 '15 at 15:09
2

hashCode is a native method, which means that a system library is called internally. This is because of the reason that hashcode internally will try to generate a number depending on the object memory location. This code is machine dependent and probably written in C.

But if you are really interested to see the native code, then follow this:

http://hg.openjdk.java.net/jdk7/jdk7-gate/jdk/file/e947a98ea3c1/src/share/native/java/

Juned Ahsan
  • 67,789
  • 12
  • 98
  • 136
  • Well, thats the thing-- is object memory location the only param to go into hashCode() calculation? how's it done-- on the lower-end bits of the address maybe? – Roam Jul 31 '13 at 18:21
0

It's because it relies on low-level details that aren't exposed to Java code. Some basic parts of the standard library (like java.lang.Object) must be implemented in native code.

As an aside, you can find at least one interesting article that does into more detail about the HotSpot implementation.

Jeremy Roman
  • 16,137
  • 1
  • 43
  • 44