0

I have a token class that uses object identity (as in equals just returns tokenA == tokenB). I'd like to use it in a TreeSet. This means that I need to implement a comparison between two tokens that is compatible with reference equality.I don't care about the specific implementation, so long as it is consistent with equals and fulfills the contract (as per TreeSet: "Note that the ordering maintained by a set (whether or not an explicit comparator is provided) must be consistent with equals if it is to correctly implement the Set interface.")

Note: these tokens are created on multiple threads, and may be compared on different threads than they were created on.

What would be the best method to go about doing so?

Ideas I've tried:

  • Using System.identityHashCode - the problem with this is that it is not guaranteed that two different objects will always have a different hashcode. And due to the birthday paradox you only need about 77k tokens before two will collide (assuming that System.identityHashCode is uniformly distributed over the entire 32-bit range, which may not be true...)
  • Using a comparator over the default Object.toString method for each token. Unfortunately, under the hood this just uses the hash code (same thing as above).
  • Using an int or long unique value (read: static counter -> instance variable). This bloats the size, and makes multithreading a pain (not to mention making object creation effectively singlethreaded) (AtomicInteger / AtomicLong for the static counter helps somewhat, but its the size bloat that's more annoying here).
  • Using System.identityHashCode and a static disambiguation map for any collisions. This works, but is rather complex. Also, Java by default doesn't have a ConcurrentWeakValueHashMultiMap (isn't that a mouthful), which means that I've got to pull in an external dependency (or write my own - probably using something similar to this) to do so, or suffer a (slow) memory leak, or use finalizers (ugh). (And I don't know if anyone implements such a thing...)

By the way, I can't simply punt the problem and assume unique objects have unique hash codes. That's what I was doing, but the assertion fired in the comparator, and so I dug into it, and, lo and behold, on my machine the following:

import java.util.*;
import java.util.Collections.*;
import java.lang.*;

public class size {
    public static void main(String[] args) {
        Map<Integer, Integer> soFar = new HashMap<>();
        for (int i = 1; i <= 1_000_000; i++) {
            TokenA t = new TokenA();
            int ihc = System.identityHashCode(t);
            if (soFar.containsKey(ihc)) {
                System.out.println("Collision: " + ihc +" @ object #" + soFar.get(ihc) + " & " + i);
                break;
            }
            soFar.put(ihc, i);
        }
    }
}
class TokenA {

}

prints

Collision: 2134400190 @ object #62355 & 105842

So collisions definitely do exist. So, any suggestions?

Community
  • 1
  • 1
TLW
  • 1,373
  • 9
  • 22
  • it's better to say what are you doing with this tokens. maybe you dont need all this stuff – AdamSkywalker Jun 17 '16 at 16:31
  • Why do you think a `TreeSet` must implement `compareTo`? – Grim Jun 17 '16 at 16:31
  • Looks like you're trying to implement large scale and distributed UUID's. If so, check http://stackoverflow.com/questions/2671858/distributed-sequence-number-generation for and idea. – Grasshopper Jun 17 '16 at 16:35
  • I would just take an incrementing `AtomicLong` as ordering property. 8 bytes are no bloat and they work work still very well in concurrent scenarios. – zapl Jun 17 '16 at 16:36
  • You don't need a ConcurrentHashMultimapWhatever. What you're describing there is, in fact, the strategy used by Guava's Ordering.arbitrary, but even so you're much much better off writing a real Comparator. – Louis Wasserman Jun 17 '16 at 16:49
  • @PeterRader - Quote, "The elements are ordered using their natural ordering, or by a Comparator provided at set creation time, depending on which constructor is used. " So either the tokens need to implement compareTo or I need a comparator - which needs to implement compare. Either way, I need a way to compare two of these tokens. – TLW Jun 17 '16 at 17:26
  • @LouisWasserman - what do you mean by a "real Comparator"? Either I write Token.compareTo or I write a Comparator. But either way I need a way to implement a comparison between two Tokens that is consistent with reference equality. – TLW Jun 17 '16 at 17:27
  • @LouisWasserman - thank you for the reference to Guava's Ordering.arbitrary. It looks like they use my last approach. – TLW Jun 17 '16 at 17:32
  • @TLW You read that wrong, if noone implements `compareTo` the natural order is used. – Grim Jun 17 '16 at 17:37
  • By "a real Comparator" I meant one that doesn't use reference equality but one that uses some explicit property of the object, maybe a unique ID number from an AtomicLong. – Louis Wasserman Jun 17 '16 at 17:40
  • I do not believe the linked "duplicate" has anything like an answer to this question. I am looking for how to implement an *identity* comparator, which the linked question does not do. – TLW Jul 06 '16 at 22:24

1 Answers1

0

There is no magic:

Here is the problem tokenA == tokenB compares identity, tokenA.equals(tokenB) compares whatever is defined in .equals() for that class regardless of identity.

So two objects can have .equals() return true and not be the same object instance, they don't even have to the the same type or share a super type.

There is no short cuts:

Implementing compareTo() is whatever you want to compare that are attributes of the objects. You just have to write the code and make it do what you want, but compareTo() is probably not what you want. compareTo() is for comparisons, if you two things are not < or > each other in some meaningful way then Comparable and Comparator<T> are not what you want.

Equals that is identity is simple:

public boolean equals(Object o)
{
    return this == o;
}
Community
  • 1
  • 1
  • "tokenA.equals(tokenB) compares whatever is defined in .equals() for that class regardless of identity." This is true, except that in my case `equals` is defined to return reference identity. "So two objects can have .equals() return true and not be the same object instance" Again, in my case equals is defined in such a way that this is false. You're talking about generalities, I'm talking about this specific case. – TLW Jun 17 '16 at 17:18
  • "You just have to write the code and make it do what you want." Right, and I'm asking how to do so. – TLW Jun 17 '16 at 17:18