15

Is it possible for two instances of Object to have the same hashCode()?

In theory an object's hashCode is derived from its memory address, so all hashCodes should be unique, but what if objects are moved around during GC?

Raedwald
  • 46,613
  • 43
  • 151
  • 237
Eleco
  • 3,194
  • 9
  • 31
  • 40
  • If objects are moved around, wont their addresses also change, and therefore their hashcodes ? – Vineet Reynolds Sep 04 '09 at 19:19
  • @Vineet: How do you "move" an object in java? – Brian Sep 04 '09 at 19:24
  • @Brian, the OP's concern seems to be about what happens during the compaction phase of garbage collection. During compaction, objects can be moved across addresses. – Vineet Reynolds Sep 04 '09 at 19:27
  • 1
    The Sun JVM bases the Object hash code on an unmoving handle to the object. – erickson Sep 04 '09 at 19:43
  • Handles were got rid of about a decade ago (early HotSpot (for 1.2.2) briefly reintroduced them). – Tom Hawtin - tackline Sep 04 '09 at 20:07
  • Vineet: A lot of object moving happens in the young generation (eden and survivor space). All the live objects are moved out, then a whole area (almost all dead objects) is dropped as one. – Tom Hawtin - tackline Sep 04 '09 at 20:09
  • Equal objects (I don't know about objects of type Object, but for your own classes) where obj.equals(obj2) == true should have the same hashcode. – mk12 Sep 04 '09 at 20:28
  • @Tom, honestly I didnt know that, just discovering these properties of the JVM now. Don't I sound a bit foolish? – Vineet Reynolds Sep 04 '09 at 20:29
  • I'm going to look more foolish saying this, but the hashcode values before and after a GC are going to be the same, despite the object being moved around in memory. It is actually done to ensure that the contract of the hashcode() method to have a consistent hashcode value during object lifetime is maintained. I've hereby eaten my first comment. – Vineet Reynolds Sep 04 '09 at 21:17
  • [The `hashCode` of an object is not derived from the memory address of the object in a typical modern JVM](https://stackoverflow.com/questions/2237720/what-is-an-objects-hash-code-if-hashcode-is-not-overridden). – Raedwald Oct 09 '19 at 12:42
  • [And again](https://stackoverflow.com/questions/16418713/does-hashcode-number-represent-the-memory-address). – Raedwald Oct 09 '19 at 12:45

7 Answers7

16

I think the docs for object's hashCode method state the answer.

"As much as is reasonably practical, the hashCode method defined by class Object does return distinct integers for distinct objects. (This is typically implemented by converting the internal address of the object into an integer, but this implementation technique is not required by the JavaTM programming language.)"

RichardOD
  • 28,883
  • 9
  • 61
  • 81
  • 4
    I read the javadoc, but I'm not sure to understand what "reasonably practical" is supposed to mean. – Eleco Sep 04 '09 at 19:25
  • 3
    Well for a start on some architectures the address space is bigger than an int, so two distinct values could yield the same value. – RichardOD Sep 04 '09 at 19:32
  • @elec "reasonably practical" means you don't write a 10k line method to make it absolutely unique at all (CPU) costs. – sal Sep 04 '09 at 19:45
  • In practice "reasonably practical" means anything goes. It's a quality of implementation matter. – Tom Hawtin - tackline Sep 04 '09 at 20:11
15

Given a reasonable collection of objects, having two with the same hash code is quite likely. In the best case it becomes the birthday problem, with a clash with tens of thousands of objects. In practice objects a created with a relatively small pool of likely hash codes, and clashes can easily happen with merely thousands of objects.

Using memory address is just a way of obtaining a slightly random number. The Sun JDK source has a switch to enable use of a Secure Random Number Generator or a constant. I believe IBM (used to?) use a fast random number generator, but it was not at all secure. The mention in the docs of memory address appears to be of a historical nature (around a decade ago it was not unusual to have object handles with fixed locations).

Here's some code I wrote a few years ago to demonstrate clashes:

class HashClash {
    public static void main(String[] args) {
        final Object obj = new Object();
        final int target = obj.hashCode();
        Object clash;
        long ct = 0;
        do {
            clash = new Object();
            ++ct;
        } while (clash.hashCode() != target && ct<10L*1000*1000*1000L);
        if (clash.hashCode() == target) {
            System.out.println(ct+": "+obj+" - "+clash);
        } else {
            System.out.println("No clashes found");
        }
    }
}

RFE to clarify docs, because this comes up way too frequently: CR 6321873

Tom Hawtin - tackline
  • 145,806
  • 30
  • 211
  • 305
  • So can we easily say that default hashcode will not guarantee uniqueness in Java? – Koray Tugay Jul 06 '14 at 15:15
  • For a given implementation of a JRE. It's easy to write a short program to find collisions. – Tom Hawtin - tackline Jul 06 '14 at 20:21
  • I just ran the code on Java HotSpot(TM) 64-Bit Server VM (build 25.101-b13, mixed mode). Ran it twice, it clashed only one time every 10 billion times. And always with the same hashcode! I guess they don't change the random generator seed (which makes me shiver). – sscarduzio Dec 17 '17 at 17:47
  • 1
    @sscarduzio One clash in every 10 billion pairs would be extremely good - there's only 4 billion or so possibilities for a 32-bit int. – Tom Hawtin - tackline Dec 18 '17 at 03:48
  • 1
    oh! you are right. I misinterpreted how to use the code, the first collision already happens at 521,133,040th object. – sscarduzio Dec 20 '17 at 19:27
10

Think about it. There are an infinite number of potential objects, and only 4 billion hash codes. Clearly, an infinity of potential objects share each hash code.

The Sun JVM either bases the Object hash code on a stable handle to the object or caches the initial hash code. Compaction during GC will not alter the hashCode(). Everything would break if it did.

erickson
  • 265,237
  • 58
  • 395
  • 493
  • Some good points, but it is important to remember that string's hashCode implementation (http://java.sun.com/j2se/1.4.2/docs/api/java/lang/String.html#hashCode()) is not the same implementation as object's hashCode. – RichardOD Sep 04 '09 at 19:46
  • Thanks! I realized too late that the question was specific to the Object class's implementation. I removed the string examples (though I think they are fun) and added some info I gathered from looking at the native code portion of the Sun JVM. – erickson Sep 04 '09 at 19:47
  • @erickson : Where do I start looking at the native code of the Object implementation. Kind of lost after opening Object.c .... – Vineet Reynolds Sep 04 '09 at 20:11
  • @erickson, nvm. Found the hashcode implementation in the hotspot sources. – Vineet Reynolds Sep 04 '09 at 20:46
  • 1
    Well, if you look at Object.java, you'll see that it uses System.identityHashCode, so you'll need to look at System.c. The upshot is that most of the good stuff happens in hotspot, specifically check out oops/oop.inline.hpp and oops/oop.cpp for entry points. – erickson Sep 04 '09 at 21:14
  • 2
    Thanks, I did find the hashcode implementation to be in runtime/sychronizer.cpp. It happens to be in FastHashCode method. – Vineet Reynolds Sep 04 '09 at 21:23
7

Is it possible?

Yes.

Does it happen with any reasonable degree of frequency?

No.

GWLlosa
  • 23,995
  • 17
  • 79
  • 116
1

I assume the original question is only about the hash codes generated by the default Object implementation. The fact is that hash codes must not be relied on for equality testing and are only used in some specific hash mapping operations (such as those implemented by the very useful HashMap implementation).

As such they have no need of being really unique - they only have to be unique enough to not generate a lot of clashes (which will render the HashMap implementation inefficient).

Also it is expected that when developer implement classes that are meant to be stored in HashMaps they will implement a hash code algorithm that has a low chance of clashes for objects of the same class (assuming you only store objects of the same class in application HashMaps), and knowing about the data makes it much easier to implement robust hashing.

Also see Ken's answer about equality necessitating identical hash codes.

Guss
  • 30,470
  • 17
  • 104
  • 128
0

Are you talking about the actual class Object or objects in general? You use both in the question. (And real-world apps generally don't create a lot of instances of Object)

For objects in general, it is common to write a class for which you want to override equals(); and if you do that, you must also override hashCode() so that two different instances of that class that are "equal" must also have the same hash code. You are likely to get a "duplicate" hash code in that case, among instances of the same class.

Also, when implementing hashCode() in different classes, they are often based on something in the object, so you end up with less "random" values, resulting in "duplicate" hash codes among instances of different classes (whether or not those objects are "equal").

In any real-world app, it is not unusual to find to different objects with the same hash code.

Ken
  • 814
  • 4
  • 3
-1

If there were as many hashcodes as memory addresses, then it would took the whole memory to store the hash itself. :-)

So, yes, the hash codes should sometimes happen to coincide.

P Shved
  • 96,026
  • 17
  • 121
  • 165
  • 1
    No. Imagine a computer that has 2^N memory addresses. A variable with N bits is sufficiently large to hold 2^N distinct values. – Solomon Slow Dec 04 '14 at 19:06