4

Java's System.identityHashCode()

Returns the same hash code for the given object as would be returned by the default method hashCode(), whether or not the given object's class overrides hashCode().

That hash code is based on the object identity, so it will always be the same for the same object, no matter if the object is mutated between calls to identityHashCode().

In addition to that, there will not be hash collisions between any two living objects (with some Java runtimes): (the former is an inaccurate statement by Oracle in the source given below, as Jai's answer shows, and as another bug report points out as well - which basically invalidates my original question...)

[...] garbage objects are readily reclaimed and the address space is reused. The collisons result from address space reuse. If the original object remains live (not GCed) then you will not encounter this problem.

Source

In .Net, there is RuntimeHelpers.GetHashCode(), which fulfills the first condition, but not the second:

Note that GetHashCode always returns identical hash codes for equal object references. However, the reverse is not true: equal hash codes do not indicate equal object references. A particular hash code value is not unique to a particular object reference; different object references can generate identical hash codes.

So is there anything like Java's identityHashCode() in .Net?

Edit:

It was suggested that this is the same as Memory address of an object in C# which it is not, as the memory address cannot be used here (solely), as memory management moves objects around, hence the address may change during the lifetime of an object.

Evgeniy Berezovsky
  • 18,571
  • 13
  • 82
  • 156
  • 1
    Java doesn't make any guarantee that distinct objects will have distinct hash codes. – shmosel Nov 13 '18 at 03:24
  • @shmosel The docs don't guarantee, but Oracle's comment to the bug report I quoted does say as much. – Evgeniy Berezovsky Nov 13 '18 at 03:27
  • 1
    Guarantees are not the same as implementation details. – mjwills Nov 13 '18 at 03:28
  • No, the hash method returns an `int` (32-bit value) which is normally beyond the address range that the heap memory size can allocate. This will cause objects at diff memory to return the same `int` value. – Jai Nov 13 '18 at 03:30
  • @mjwills Re: your first comment: It will allow me to write a `int CompareTo(X other)` that falls back to `RuntimeHelpers.GetHashCode(this) - RuntimeHelpers.GetHashCode(other)` to get a stable sort. – Evgeniy Berezovsky Nov 13 '18 at 03:32
  • Sure - but I suspect it is the best that you have. An alternative, when sorting, is to do a projection including the original index and then sort by the value **then the index**. – mjwills Nov 13 '18 at 03:40
  • 1
    Why would you remove the `I am trying to build a stable sort algorithm.` statement I added to your post? It really gave useful context to your problem. – mjwills Nov 13 '18 at 03:49
  • 1
    I also suspect you are misunderstanding how Java does this. It does guarantee that a hash code will not change for an object (as per https://stackoverflow.com/questions/3796699/will-hashcode-return-a-different-int-due-to-compaction-of-tenure-space). But if a GC occurs, memory is compacted and an object is moved elsewhere then nothing stops a new instance of the type occupying the same memory address as the 'original' object. Then, you will have two objects with the same hashcode. As such, you can't state `In addition to that, there will not be hash collisions between any two living objects`. – mjwills Nov 13 '18 at 03:56
  • 1
    While Java's `Object#hashCode()` and `System.identifyHashCode()` guarantees that the value it returns for a particular *instance* would never change, it does not mention that it would return a unique value for each object. In fact, it only returns the trailing 32-bit memory address in the heap of the object - this means that `0x 0 FFFF FFFF` and `0x 1 FFFF FFFF` would both return `0x FFFF FFFF`. You can modify the bug report's example, such that you make a static list to store the `obj` to prevent GC, collision *will* still occur. – Jai Nov 13 '18 at 05:03
  • `so it will always be the same for the same object, no matter if the object is mutated between calls` I am not sure this is true either - https://docs.oracle.com/javase/7/docs/api/java/lang/Object.html#hashCode%28%29 ("provided no information used in equals comparisons on the object is modified" means that the hash code **can** change if the object is mutated) – mjwills Nov 13 '18 at 05:49

3 Answers3

4

Currently Java's Object#hashCode() and System#identifyHashCode() do not ensure unique values to be returned. There is already questions on this, and this is an example.

You have mentioned a bug report which states that collision occurred because objects were garbage collected, and the same memory address is reused. However modifying the same test case would prove otherwise:

List<Object> allObjs = new ArrayList<>(); // Used to prevent GC
Set<Integer> hashes = new HashSet<Integer>(1024);

int colls = 0;
for (int n = 0; n < 100000; n++)
{
    Integer obj = new Integer(88);
    allObjs.add(obj); // keep a strong reference to prevent GC
    int ihash = System.identityHashCode(obj);
    Integer iho = Integer.valueOf(ihash);
    if (hashes.contains(iho))
    {
        System.err.println("System.identityHashCode() collision!");
        colls++;
    }
    else
    {
        hashes.add(iho);
    }
}

System.out.println("created 100000 different objects - "
        + colls
        + " times with the same value for System.identityHashCode()");

System.out.println("Size of all objects is " + allObjs.size());
System.out.println("Size of hashset of hash values is " + hashes.size());

Result:

System.identityHashCode() collision!
System.identityHashCode() collision!
System.identityHashCode() collision!
created 100000 different objects - 3 times with the same value for System.identityHashCode()
Size of all objects is 100000
Size of hashset of hash values is 99997

In the linked SO question, it was also mentioned that in some implementations of JRE, the rate of collision is greatly reduced. However, it does seem like no implementation has managed to prevent all collisions. Therefore, there is no way of ensuring uniqueness of hash codes even in Java.

Therefore, don't simply believe based on one source. The person commenting it is also just a member of the Oracle team, and he or she is most likely not the person designing this.

In both C# and Java, you would have to create your own unique number generator of some kind. So the solution provided by NPras seems to do that for .NET.

Jai
  • 8,165
  • 2
  • 21
  • 52
  • 1
    You cured me from believing that Java will prevent collisions. Thinking about it, a non-colliding implementation would also not be really fast, because it would need synchronization of threads etc. – Evgeniy Berezovsky Nov 13 '18 at 05:52
  • @EugeneBeresovsky In order to prevent collision, JVM has to prevent you from creating more than 2³² objects (this is even assuming each object is 1 bit) within its runtime. Doesn't sound like a smart idea too right? If the hashcode implementation returned a `long` then it could be quite safe for maybe... 10 to 20 years? – Jai Nov 13 '18 at 05:55
1

I would refer you to the following answer from Eric Lippert (who was part of the C# language design & compiler team) where he suggested using ObjectIDGenerator.

To generate unique ids for objects you could use the aptly named ObjectIDGenerator that we conveniently provide for you

Looking at the reference source (good thing they open-sourced the framework now), it does use RuntimeHelpers.GetHashCode() but also handles the potential collision by storing the references separately.

Do note his warning about object lifetime. If you need it for transient objects, he suggested you reimplement the generator - which is now much easier that you have access to the source.

NPras
  • 3,135
  • 15
  • 29
  • 1
    This is the closest the OP is going to get to an answer, I suspect. But it is hard to see how this would be useful in implementing a "stable" sort. – mjwills Nov 13 '18 at 04:27
  • True. But that should probably be in a different question now that it's edited out :) Besides, I don't see how java's `identityHashCode` would help either. – NPras Nov 13 '18 at 04:35
  • That is true. :) – mjwills Nov 13 '18 at 04:41
  • Thanks NPras. That ObjectIDGenerator however uses strong references, so it will never free memory. Using a cache with weak references a la [WeakReference](https://learn.microsoft.com/en-us/dotnet/api/system.weakreference?view=netframework-4.7.2) would do the trick. – Evgeniy Berezovsky Nov 13 '18 at 05:49
  • Yeah, that's what he suggested. Re-implement the class using weak references. – NPras Nov 13 '18 at 06:03
0

To cut a long story short:

The equivalent to Java's System.identityHashCode( Object obj ) in DotNet is System.Runtime.CompilerServices.RuntimeHelpers.GetHashCode( object obj ).

Mike Nakis
  • 56,297
  • 11
  • 110
  • 142