hashCode uniqueness

Question

Is it possible for two instances of Object to have the same hashCode()?

In theory an object's hashCode is derived from its memory address, so all hashCodes should be unique, but what if objects are moved around during GC?

If objects are moved around, wont their addresses also change, and therefore their hashcodes ? — Vineet Reynolds, Sep 04 '09 at 19:19
@Brian, the OP's concern seems to be about what happens during the compaction phase of garbage collection. During compaction, objects can be moved across addresses. — Vineet Reynolds, Sep 04 '09 at 19:27
The Sun JVM bases the Object hash code on an unmoving handle to the object. — erickson, Sep 04 '09 at 19:43
Handles were got rid of about a decade ago (early HotSpot (for 1.2.2) briefly reintroduced them). — Tom Hawtin - tackline, Sep 04 '09 at 20:07
Vineet: A lot of object moving happens in the young generation (eden and survivor space). All the live objects are moved out, then a whole area (almost all dead objects) is dropped as one. — Tom Hawtin - tackline, Sep 04 '09 at 20:09
Equal objects (I don't know about objects of type Object, but for your own classes) where obj.equals(obj2) == true should have the same hashcode. — mk12, Sep 04 '09 at 20:28
@Tom, honestly I didnt know that, just discovering these properties of the JVM now. Don't I sound a bit foolish? — Vineet Reynolds, Sep 04 '09 at 20:29
I'm going to look more foolish saying this, but the hashcode values before and after a GC are going to be the same, despite the object being moved around in memory. It is actually done to ensure that the contract of the hashcode() method to have a consistent hashcode value during object lifetime is maintained. I've hereby eaten my first comment. — Vineet Reynolds, Sep 04 '09 at 21:17
[The `hashCode` of an object is not derived from the memory address of the object in a typical modern JVM](https://stackoverflow.com/questions/2237720/what-is-an-objects-hash-code-if-hashcode-is-not-overridden). — Raedwald, Oct 09 '19 at 12:42
[And again](https://stackoverflow.com/questions/16418713/does-hashcode-number-represent-the-memory-address). — Raedwald, Oct 09 '19 at 12:45

score 16 · Answer 1 · answered Sep 04 '09 at 19:21

16

I think the docs for object's hashCode method state the answer.

"As much as is reasonably practical, the hashCode method defined by class Object does return distinct integers for distinct objects. (This is typically implemented by converting the internal address of the object into an integer, but this implementation technique is not required by the JavaTM programming language.)"

answered Sep 04 '09 at 19:21

RichardOD

28,883
9
61
81

4

I read the javadoc, but I'm not sure to understand what "reasonably practical" is supposed to mean. – Eleco Sep 04 '09 at 19:25
3

Well for a start on some architectures the address space is bigger than an int, so two distinct values could yield the same value. – RichardOD Sep 04 '09 at 19:32
@elec "reasonably practical" means you don't write a 10k line method to make it absolutely unique at all (CPU) costs. – sal Sep 04 '09 at 19:45
In practice "reasonably practical" means anything goes. It's a quality of implementation matter. – Tom Hawtin - tackline Sep 04 '09 at 20:11

Tom Hawtin - tackline · Accepted Answer · 2017-12-18T03:50:20.727

15

Given a reasonable collection of objects, having two with the same hash code is quite likely. In the best case it becomes the birthday problem, with a clash with tens of thousands of objects. In practice objects a created with a relatively small pool of likely hash codes, and clashes can easily happen with merely thousands of objects.

Using memory address is just a way of obtaining a slightly random number. The Sun JDK source has a switch to enable use of a Secure Random Number Generator or a constant. I believe IBM (used to?) use a fast random number generator, but it was not at all secure. The mention in the docs of memory address appears to be of a historical nature (around a decade ago it was not unusual to have object handles with fixed locations).

Here's some code I wrote a few years ago to demonstrate clashes:

class HashClash {
    public static void main(String[] args) {
        final Object obj = new Object();
        final int target = obj.hashCode();
        Object clash;
        long ct = 0;
        do {
            clash = new Object();
            ++ct;
        } while (clash.hashCode() != target && ct<10L*1000*1000*1000L);
        if (clash.hashCode() == target) {
            System.out.println(ct+": "+obj+" - "+clash);
        } else {
            System.out.println("No clashes found");
        }
    }
}

RFE to clarify docs, because this comes up way too frequently: CR 6321873

edited Dec 18 '17 at 03:50

answered Sep 04 '09 at 20:02

Tom Hawtin - tackline

145,806
30
211
305

So can we easily say that default hashcode will not guarantee uniqueness in Java? – Koray Tugay Jul 06 '14 at 15:15
For a given implementation of a JRE. It's easy to write a short program to find collisions. – Tom Hawtin - tackline Jul 06 '14 at 20:21
I just ran the code on Java HotSpot(TM) 64-Bit Server VM (build 25.101-b13, mixed mode). Ran it twice, it clashed only one time every 10 billion times. And always with the same hashcode! I guess they don't change the random generator seed (which makes me shiver). – sscarduzio Dec 17 '17 at 17:47
1

@sscarduzio One clash in every 10 billion pairs would be extremely good - there's only 4 billion or so possibilities for a 32-bit int. – Tom Hawtin - tackline Dec 18 '17 at 03:48
1

oh! you are right. I misinterpreted how to use the code, the first collision already happens at 521,133,040th object. – sscarduzio Dec 20 '17 at 19:27

erickson · Answer 3 · 2009-09-04T19:46:11.050

10

Think about it. There are an infinite number of potential objects, and only 4 billion hash codes. Clearly, an infinity of potential objects share each hash code.

The Sun JVM either bases the Object hash code on a stable handle to the object or caches the initial hash code. Compaction during GC will not alter the hashCode(). Everything would break if it did.

edited Sep 04 '09 at 19:46

answered Sep 04 '09 at 19:34

erickson

265,237
58
395
493

Some good points, but it is important to remember that string's hashCode implementation (http://java.sun.com/j2se/1.4.2/docs/api/java/lang/String.html#hashCode()) is not the same implementation as object's hashCode. – RichardOD Sep 04 '09 at 19:46
Thanks! I realized too late that the question was specific to the Object class's implementation. I removed the string examples (though I think they are fun) and added some info I gathered from looking at the native code portion of the Sun JVM. – erickson Sep 04 '09 at 19:47
@erickson : Where do I start looking at the native code of the Object implementation. Kind of lost after opening Object.c .... – Vineet Reynolds Sep 04 '09 at 20:11
@erickson, nvm. Found the hashcode implementation in the hotspot sources. – Vineet Reynolds Sep 04 '09 at 20:46
1

Well, if you look at Object.java, you'll see that it uses System.identityHashCode, so you'll need to look at System.c. The upshot is that most of the good stuff happens in hotspot, specifically check out oops/oop.inline.hpp and oops/oop.cpp for entry points. – erickson Sep 04 '09 at 21:14
2

Thanks, I did find the hashcode implementation to be in runtime/sychronizer.cpp. It happens to be in FastHashCode method. – Vineet Reynolds Sep 04 '09 at 21:23

score 7 · Answer 4 · answered Sep 04 '09 at 19:28

7

Is it possible?

Yes.

Does it happen with any reasonable degree of frequency?

No.

answered Sep 04 '09 at 19:28

GWLlosa

23,995
17
79
116

3

Exactly this. Don't *count* on them being unique, but don't expect them to be the same. – Alex Feinman Sep 04 '09 at 19:38
...unless you have a number of objects. It's usually could enough for reasonably sized hash tables. – Tom Hawtin - tackline Sep 04 '09 at 20:10

score 1 · Answer 5 · answered Sep 04 '09 at 22:59

I assume the original question is only about the hash codes generated by the default Object implementation. The fact is that hash codes must not be relied on for equality testing and are only used in some specific hash mapping operations (such as those implemented by the very useful HashMap implementation).

As such they have no need of being really unique - they only have to be unique enough to not generate a lot of clashes (which will render the HashMap implementation inefficient).

Also it is expected that when developer implement classes that are meant to be stored in HashMaps they will implement a hash code algorithm that has a low chance of clashes for objects of the same class (assuming you only store objects of the same class in application HashMaps), and knowing about the data makes it much easier to implement robust hashing.

Also see Ken's answer about equality necessitating identical hash codes.

score 0 · Answer 6 · answered Sep 04 '09 at 22:55

Are you talking about the actual class Object or objects in general? You use both in the question. (And real-world apps generally don't create a lot of instances of Object)

For objects in general, it is common to write a class for which you want to override equals(); and if you do that, you must also override hashCode() so that two different instances of that class that are "equal" must also have the same hash code. You are likely to get a "duplicate" hash code in that case, among instances of the same class.

Also, when implementing hashCode() in different classes, they are often based on something in the object, so you end up with less "random" values, resulting in "duplicate" hash codes among instances of different classes (whether or not those objects are "equal").

In any real-world app, it is not unusual to find to different objects with the same hash code.

score -1 · Answer 7 · answered Sep 04 '09 at 20:22

-1

If there were as many hashcodes as memory addresses, then it would took the whole memory to store the hash itself. :-)

So, yes, the hash codes should sometimes happen to coincide.

answered Sep 04 '09 at 20:22

P Shved

96,026
17
121
165

1

No. Imagine a computer that has 2^N memory addresses. A variable with N bits is sufficiently large to hold 2^N distinct values. – Solomon Slow Dec 04 '14 at 19:06

hashCode uniqueness

7 Answers7

Linked

Related