11

I have a collection of objects, let's call them A, B, C, D,... and some are equal to others. If A and C are equal, then I want to replace every reference to C with a reference to A. This means (a) object C can be garbage collected, freeing up memory, and (b) I can later use "==" to compare objects in place of an expensive equals() operation. (These objects are large and the equals() operation is slow.)

My instinct was to use a java.util.Set. When I encounter C I can easily see if there is an entry in the Set equal to C. But if there is, there seems to be no easy way to find out what that entry is, and replace my reference to the existing entry. Am I mistaken? Iterating over all the entries to find the one that matches is obviously a non-starter.

Currently, instead of a Set, I'm using a Map in which the value is always the same as the key. Calling map.get(C) then finds A. This works, but it feels incredibly convoluted. Is there a more elegant way of doing it?

Michael Kay
  • 156,231
  • 11
  • 92
  • 164
  • 4
    This post (https://stackoverflow.com/questions/7283338/getting-an-element-from-a-set) and especially its first answer is pretty relevant to this post, although I don't think that it's a duplicate. From what I can tell though, the Map is not an uncommon practice here. – Kevin W. Sep 11 '18 at 20:29
  • 1
    Looking at the source for `HashSet`, `public boolean add(E e) { return map.put(e, PRESENT)==null; }`. If I'm not mistaken, this is exactly the behaviour you want? – Neil Sep 11 '18 at 20:48
  • 3
    @NeilEdelman that doesn't return the object it returns true or false.true if there was no mapping and false if the object was already there. – mavriksc Sep 11 '18 at 21:04
  • correct me if i'm wrong, you want to use `Set` to avoid duplicate objects, but still you override the `.equals()` method `HashMap` or `Set` uses `.equals()` to compare the object, so why don't you use `ArrayList()` but before insert you can check with `contains()` and you can get index by `indexof()` – Ryuzaki L Sep 11 '18 at 21:12
  • You are right. There is not an 'easy' way to get the object. You have to iterate, so maybe your first thought, of using HashSet, is not the best approach – stelios.anastasakis Sep 11 '18 at 21:26
  • 2
    You also have the SortedSet which enables you to do a binary lookup.But I don't se the MAP as a bad idea. If you don't need order, I would go with the Map. – Juan Sep 11 '18 at 21:36
  • How are you defining equivalence? Is it acceptable to assume transitivity between elements of A and C (i.e. they can be dereferenced interchangably with no side effects)? – Makoto Sep 11 '18 at 22:36
  • Here is a discussion of what you may be looking for https://www.baeldung.com/java-flyweight – Pawel Zieminski Sep 12 '18 at 00:29

1 Answers1

4

This problem is not simple de-duplication: it is a form of canonicalization.

The standard approach is to use a Map rather than a Set. Here's a sketch of how to do it:

public <T> List<T> canonicalizeList(List<T> input) {
    HashMap<T, T> map = new HashMap<>();
    List<T> output = new ArrayList<>();
    for (T element: input) {
        T canonical = map.get(element);
        if (canonical == null) {
            element = canonical;
            map.put(canonical, canonical);
        }
        output.add(canonical);
    }
    return output;
}

Note that this is O(N). If you can safely assume that the percentage of duplicates in input is likely to be small, then you could set the capacity of map and output to the size of input.


Now you seem to be saying that you are doing it this way already (last paragraph), and you are asking if there is a better way. As far as I know, there isn't one. (The HashSet API lets would let you test if a set contains a value equal to element, but it does not let you find out what it is in O(1).)

For what it is worth, under the hood the HashSet<T> class is implemented as a HashMap<T, T>. So you would not be saving time or space by using a HashSet directly ...

Stephen C
  • 698,415
  • 94
  • 811
  • 1,216
  • You could probably shorten this to input.stream().distinct().collect(Collectors.toList()) – Pawel Zieminski Sep 12 '18 at 00:23
  • Yes ... but then the OP might not understand it :-) – Stephen C Sep 12 '18 at 00:56
  • Thanks. This seem to confirm that there isn't a better way than what I'm doing already. Perhaps I should encapsulate my Map structure in something that hides the clunkiness in an implementation class rather than exposing it all over the place. – Michael Kay Sep 12 '18 at 10:08