3

Let's say I am building a TreeSet of an object for which the ordering depends on only one value.

I cannot do

TreeSet<Foo> tree = new TreeSet<>(Comparator.comparingInt(Foo::getX));

because if I add two different Foo objects with the same x then one would replace the other (i.e. if I do tree.add(foo1) and tree.add(foo2), tree.size() will be 1 instead of 2).

I could compare every field of Foo, but I would like two instances of Foo to be considered distinct, even if every field is the same.

One solution that "almost works" would be

TreeSet<Foo> tree = new TreeSet<>(Comparator.comparingInt(Foo::getX).thenComparing(Foo::hashCode));

But this would fail when there is a hash collision.

In summary, I am looking to something similar to

TreeSet<Foo> tree = new TreeSet<>(Comparator.comparingInt(Foo::getX).thenComparing(Foo::getInternalAddress));

But of course we don't have access to such method.

Workarounds

I know there are workarounds:

  • if I don't care about the objects themselves but just how many are in the tree, I can use a multiset TreeMap<Foo, Integer> (and comparing all the fields) that give me how many Foos for a specific x
  • if I do care about the objects (I am making reference equality checks) I can use a different multiset TreeMap<Foo, List<Foo>> (or TreeMap<Integer, List<Foo>> with the key being x). But if there are very few "duplicated" foos, this is a waste of space taken by all the singleton lists.

So while I know there are workarounds with a TreeMap, I still wonder if there is a way to do that with just a TreeSet.

Ricola
  • 2,621
  • 12
  • 22
  • 2
    Can you make changes to `Foo`? How about simply adding a `UUID` field and sort by that? – Sweeper Oct 02 '22 at 11:48
  • 3
    I also slightly suspect that this is an [XY problem](https://xyproblem.info). – Sweeper Oct 02 '22 at 11:53
  • 2
    Sounds like you want a sorted list, not a set. See [Why is there no SortedList in Java?](https://stackoverflow.com/q/8725387/12299000) – kaya3 Oct 02 '22 at 11:54
  • 1
    @Sweeper I could do that. If there is a way without modifying Foo, I'd rather have it, but this is a good workaround. Is there any risk of collision with `UUID.randomUUID()`? But I guess if I worry about collision I could simply have a static counter of the instances instead. – Ricola Oct 02 '22 at 11:56
  • @kaya3 I do want `O(log n)` insertion and deletion tough. – Ricola Oct 02 '22 at 11:58
  • @Ricola A sensible implementation of a sorted list (or bag) will have O(log n) insertion and removal. See the linked Q&A for some suggestions. Depending on how you intend to use your collection, a priority queue may be a better option anyway. – kaya3 Oct 02 '22 at 11:59
  • 1
    @Ricola How do you want to order multiple `Foo` instances, when they have the same `X` value? How do you define your ordering of multiple `Foo` instances in that case? Why is one `Foo` instance placed "before" another `Foo` instance based on your desired comparator? – Progman Oct 02 '22 at 13:01
  • @Gardener and @Progman I don't care about the ordering of two foos with the same X. What I need though is the comparator to be consistent with `Object::equals` and consistent with itself (i.e `a.compare(b)` returns always the same with same `a` and `b` and `b.compare(a)` to be consistent with `a.compare(b)`). This is why I gave the example of the internal address as I don't know how else I could have it (other than the UUI solution suggested by @Sweeper). What I am looking for is a BST where the nodes are ordered based on `x`, and ties are broken arbitrarily. – Ricola Oct 02 '22 at 14:15
  • @kaya3 "A sensible implementation of a sorted list (or bag) will have O(log n) insertion and removal." => which is the multiset implementation I mentioned in my question – Ricola Oct 02 '22 at 14:18
  • 1
    But the "multiset implementation" in your question is a set, not a bag. Rather than trying to make a set behave the way you want, it would be simpler to use a data structure which is supposed to behave that way. – kaya3 Oct 02 '22 at 14:20
  • 1
    You could try using `System.identityHashCode` instead of `Foo.hashCode` for the second comparison. There are still no guarantees about hash collisions though. – Rob Spoor Oct 02 '22 at 14:21
  • @Sweeper "I also slightly suspect that this is an XY problem". I am not trying to solve a problem in particular, but a similar set of problems and I was curious if other solutions exist. To give more details, in competitive programming, I often need to have sorted objects based on some fields with fast insertion and retrieval. The multiset implementation is a bit cumbersome so I wanted to know if there is a way to achieve that with only a set (so it's faster to code). Your solution does meet the criteria, or I keep a generic multiset implementation somewhere that I can quickly paste. – Ricola Oct 02 '22 at 14:26
  • @kaya3 "But the "multiset implementation" in your question is a set, not a bag." Not sure what you mean here, `TreeMap>` is the implementation of a bag. – Ricola Oct 02 '22 at 14:30
  • 6
    Have a look at Guava’s [`Ordering.arbitrary`](https://guava.dev/releases/31.0-jre/api/docs/com/google/common/collect/Ordering.html#arbitrary()) – Stuart Marks Oct 02 '22 at 16:49
  • @StuartMarks did we consider adding this to the JDK? If not, why? – Ricola Oct 03 '22 at 09:40
  • 2
    It's on the radar but it's not particularly high priority. – Stuart Marks Oct 03 '22 at 16:45

2 Answers2

6

Guava provides Ordering.arbitrary() which you can use to construct your desired Comparator:

Comparator.comparingInt(Foo::getX).thenComparing(Ordering.arbitrary())

Ordering.arbitrary() internally uses System.identityHashCode which almost solves your "order arbitrary, but consider identity" problem, except there's no guarantee that identityHashCode will never collide (trivially, since it's a 32bit value and a Java project can easily create more than 232 objects, given enough memory).

Ordering.arbitrary() uses some neat tricks to still sort objects with colliding identityHashCode values in a consistent (but arbitrary!) way, see its code.

Joachim Sauer
  • 302,674
  • 57
  • 556
  • 614
0

With an external library

See @Joachim Sauer's answer and @Stuart Marks's comment : Guava's Ordering.arbitrary().

Without external libraries

Add an id field to Foo that is unique for each instance.

With UUID class

Suggested by @Sweeper in a comment.

class Foo {
  private UUID uuid = UUID.randomUUID();
...
}
...
new TreeSet<>(Comparator.comparingInt(Foo::getX).thenComparing(Foo::getUUID));

The chance of collisions are almost inexistent (https://stackoverflow.com/a/24876263/7424948), but if you are paranoid you can use the custom instance counter below.

Instance counter

class Foo {
  // if only one thread, otherwise use AtomicInteger for thread-safe.
  // assuming we'll get less than Integer.MAX_VALUE instances, otherwise use long.
  private static int counter = 0;
  private int id;

  public Foo(...){
    id = counter++;
  } 
...
}
...
new TreeSet<>(Comparator.comparingInt(Foo::getX).thenComparingInt(Foo::getId));

Advantage over UUID is that it takes less space (32 bits per instance instead of 128 bits) and that there is less computation to get the id.

Ricola
  • 2,621
  • 12
  • 22
  • The `int` field can potentially suffer the exact same problem that `identityHashCode` has when it overflows. `long` would be safer here, but even that isn't guaranteed safe (since more than 2^64 `Foo` objects could be created in the lifetime of a VM). – Joachim Sauer Oct 03 '22 at 10:14
  • @JoachimSauer I agree, but this is why I wrote a comment over the field. It depends on the use case, the assumptions and the constraints. In the really worst case this can be a BigInteger. – Ricola Oct 03 '22 at 10:24
  • 3
    "since more than 2^64 Foo objects could be created in the lifetime of a VM". Not really. If you want to do that in 10 years, you have to allocate 58,454,204,609 Objects/second. – Johannes Kuhn Oct 03 '22 at 10:26
  • Also, I think a long has pretty much the same constraints as `Ordering.arbitrary()` current implementation (2^32 different hashcodes and then 2^32 different AtomicIntegers for collisions) – Ricola Oct 03 '22 at 10:29
  • @JohannesKuhn: fair point! And yes, @Ricola, a `long` would end up being roughly equivalent to what `Ordering.arbitrary()` does. – Joachim Sauer Oct 03 '22 at 12:15