Java arbitrary comparator on Object

Question

Let's say I am building a TreeSet of an object for which the ordering depends on only one value.

I cannot do

TreeSet<Foo> tree = new TreeSet<>(Comparator.comparingInt(Foo::getX));

because if I add two different Foo objects with the same x then one would replace the other (i.e. if I do tree.add(foo1) and tree.add(foo2), tree.size() will be 1 instead of 2).

I could compare every field of Foo, but I would like two instances of Foo to be considered distinct, even if every field is the same.

One solution that "almost works" would be

TreeSet<Foo> tree = new TreeSet<>(Comparator.comparingInt(Foo::getX).thenComparing(Foo::hashCode));

But this would fail when there is a hash collision.

In summary, I am looking to something similar to

TreeSet<Foo> tree = new TreeSet<>(Comparator.comparingInt(Foo::getX).thenComparing(Foo::getInternalAddress));

But of course we don't have access to such method.

Workarounds

I know there are workarounds:

if I don't care about the objects themselves but just how many are in the tree, I can use a multiset TreeMap<Foo, Integer> (and comparing all the fields) that give me how many Foos for a specific x
if I do care about the objects (I am making reference equality checks) I can use a different multiset TreeMap<Foo, List<Foo>> (or TreeMap<Integer, List<Foo>> with the key being x). But if there are very few "duplicated" foos, this is a waste of space taken by all the singleton lists.

So while I know there are workarounds with a TreeMap, I still wonder if there is a way to do that with just a TreeSet.

Can you make changes to `Foo`? How about simply adding a `UUID` field and sort by that? — Sweeper, Oct 02 '22 at 11:48
I also slightly suspect that this is an [XY problem](https://xyproblem.info). — Sweeper, Oct 02 '22 at 11:53
Sounds like you want a sorted list, not a set. See [Why is there no SortedList in Java?](https://stackoverflow.com/q/8725387/12299000) — kaya3, Oct 02 '22 at 11:54
@Sweeper I could do that. If there is a way without modifying Foo, I'd rather have it, but this is a good workaround. Is there any risk of collision with `UUID.randomUUID()`? But I guess if I worry about collision I could simply have a static counter of the instances instead. — Ricola, Oct 02 '22 at 11:56
@Ricola A sensible implementation of a sorted list (or bag) will have O(log n) insertion and removal. See the linked Q&A for some suggestions. Depending on how you intend to use your collection, a priority queue may be a better option anyway. — kaya3, Oct 02 '22 at 11:59
@Ricola How do you want to order multiple `Foo` instances, when they have the same `X` value? How do you define your ordering of multiple `Foo` instances in that case? Why is one `Foo` instance placed "before" another `Foo` instance based on your desired comparator? — Progman, Oct 02 '22 at 13:01
@Gardener and @Progman I don't care about the ordering of two foos with the same X. What I need though is the comparator to be consistent with `Object::equals` and consistent with itself (i.e `a.compare(b)` returns always the same with same `a` and `b` and `b.compare(a)` to be consistent with `a.compare(b)`). This is why I gave the example of the internal address as I don't know how else I could have it (other than the UUI solution suggested by @Sweeper). What I am looking for is a BST where the nodes are ordered based on `x`, and ties are broken arbitrarily. — Ricola, Oct 02 '22 at 14:15
@kaya3 "A sensible implementation of a sorted list (or bag) will have O(log n) insertion and removal." => which is the multiset implementation I mentioned in my question — Ricola, Oct 02 '22 at 14:18
But the "multiset implementation" in your question is a set, not a bag. Rather than trying to make a set behave the way you want, it would be simpler to use a data structure which is supposed to behave that way. — kaya3, Oct 02 '22 at 14:20
You could try using `System.identityHashCode` instead of `Foo.hashCode` for the second comparison. There are still no guarantees about hash collisions though. — Rob Spoor, Oct 02 '22 at 14:21
@Sweeper "I also slightly suspect that this is an XY problem". I am not trying to solve a problem in particular, but a similar set of problems and I was curious if other solutions exist. To give more details, in competitive programming, I often need to have sorted objects based on some fields with fast insertion and retrieval. The multiset implementation is a bit cumbersome so I wanted to know if there is a way to achieve that with only a set (so it's faster to code). Your solution does meet the criteria, or I keep a generic multiset implementation somewhere that I can quickly paste. — Ricola, Oct 02 '22 at 14:26
@kaya3 "But the "multiset implementation" in your question is a set, not a bag." Not sure what you mean here, `TreeMap>` is the implementation of a bag. — Ricola, Oct 02 '22 at 14:30
Have a look at Guava’s [`Ordering.arbitrary`](https://guava.dev/releases/31.0-jre/api/docs/com/google/common/collect/Ordering.html#arbitrary()) — Stuart Marks, Oct 02 '22 at 16:49
@StuartMarks did we consider adding this to the JDK? If not, why? — Ricola, Oct 03 '22 at 09:40

Joachim Sauer · Accepted Answer · 2022-10-03T10:12:51.157

6

Guava provides Ordering.arbitrary() which you can use to construct your desired Comparator:

Comparator.comparingInt(Foo::getX).thenComparing(Ordering.arbitrary())

Ordering.arbitrary() internally uses System.identityHashCode which almost solves your "order arbitrary, but consider identity" problem, except there's no guarantee that identityHashCode will never collide (trivially, since it's a 32bit value and a Java project can easily create more than 2³² objects, given enough memory).

Ordering.arbitrary() uses some neat tricks to still sort objects with colliding identityHashCode values in a consistent (but arbitrary!) way, see its code.

edited Oct 03 '22 at 10:12

answered Oct 03 '22 at 08:28

Joachim Sauer

302,674
57
556
614

1

Isn't it more something like `.thenComparing(Ordering.arbitrary());`? – Ricola Oct 03 '22 at 09:24
1

@Ricola: yes, absolutely, that was a typo, my bad. – Joachim Sauer Oct 03 '22 at 10:13

Ricola · Answer 2 · 2022-10-03T10:47:32.630

0

With an external library

See @Joachim Sauer's answer and @Stuart Marks's comment : Guava's Ordering.arbitrary().

Without external libraries

Add an id field to Foo that is unique for each instance.

With UUID class

Suggested by @Sweeper in a comment.

class Foo {
  private UUID uuid = UUID.randomUUID();
...
}
...
new TreeSet<>(Comparator.comparingInt(Foo::getX).thenComparing(Foo::getUUID));

The chance of collisions are almost inexistent (https://stackoverflow.com/a/24876263/7424948), but if you are paranoid you can use the custom instance counter below.

Instance counter

class Foo {
  // if only one thread, otherwise use AtomicInteger for thread-safe.
  // assuming we'll get less than Integer.MAX_VALUE instances, otherwise use long.
  private static int counter = 0;
  private int id;

  public Foo(...){
    id = counter++;
  } 
...
}
...
new TreeSet<>(Comparator.comparingInt(Foo::getX).thenComparingInt(Foo::getId));

Advantage over UUID is that it takes less space (32 bits per instance instead of 128 bits) and that there is less computation to get the id.

edited Oct 03 '22 at 10:47

answered Oct 03 '22 at 09:12

Ricola

2,621
12
22

The `int` field can potentially suffer the exact same problem that `identityHashCode` has when it overflows. `long` would be safer here, but even that isn't guaranteed safe (since more than 2^64 `Foo` objects could be created in the lifetime of a VM). – Joachim Sauer Oct 03 '22 at 10:14
@JoachimSauer I agree, but this is why I wrote a comment over the field. It depends on the use case, the assumptions and the constraints. In the really worst case this can be a BigInteger. – Ricola Oct 03 '22 at 10:24
3

"since more than 2^64 Foo objects could be created in the lifetime of a VM". Not really. If you want to do that in 10 years, you have to allocate 58,454,204,609 Objects/second. – Johannes Kuhn Oct 03 '22 at 10:26
Also, I think a long has pretty much the same constraints as `Ordering.arbitrary()` current implementation (2^32 different hashcodes and then 2^32 different AtomicIntegers for collisions) – Ricola Oct 03 '22 at 10:29
@JohannesKuhn: fair point! And yes, @Ricola, a `long` would end up being roughly equivalent to what `Ordering.arbitrary()` does. – Joachim Sauer Oct 03 '22 at 12:15