A set-like collection with a customizable "equals"

Question

I have a data class:

public class MyData {
  final Integer alpha;
  final Double beta;
  final Integer foo;
  final Double bar;
}

I need .equals and .hashCode to have the conventional definition involving all four fields. But I have another important requirement:

Given a large number of MyData objects, I need to rapidly check whether a new MyData object matches any of the existing ones on the .alpha and .beta fields only.

Three approaches I don't want to take:

Composite object:

public class MyData {
  final MyDataKey alphaAndBeta;
  final Integer foo;
  final Double bar;
}
public class MyDataKey {
  final Integer alpha;
  final Double beta;
}

While I could then do lookups against a HashSet<MyDataKey>, it's inelegant because all other uses of the object will need to refer to dataObj.alphaAndBeta.alpha instead of dataObj.alpha.

Comparator:

public class OnlyAlphaAndBeta implements Comparator<MyData> {
  int compare(MyData a, MyData b) {...}
}

This would then let a new TreeSet<MyData>(new OnlyAlphaAndBeta()) do the lookups I want; but with O(log(N)) instead of O(1).

Multi-level lookup:

public class MyDataLookup {
  Map<Integer, Set<Double>> existingAlphaBeta;
  
  boolean contains(MyData query) {
    Set<Double> betas = this.existingAlphaBeta.get(query.alpha);
    if (betas == null) {
      return false;
    }
    return betas.contains(query.beta);
  }

  boolean add(MyData toInsert) {...};
}

This does the job, in O(1), but what if the key was more than just 2 fields? I could keep nesting Map<A, Map<B, Map<C, ...>>> for each field in the key, but this doesn't seem right. Surely I'd rather compute just one hash and look it up in one table.

I think what I'm looking for is something like HashSet, but which can be specialized to use something other than the .equals and .hashCode methods, analogous to how Comparator redefines ordering for SortedSet. Such a collection wouldn't fulfill the Set contract anymore, but it would be "set-like".

Does something like this exist in any of the big, well-maintained Java utility libraries? Alternately, am I overlooking some obvious way of accomplishing my goals?

Have you considered using `HashMap>`, this should provide efficient O(1) lookups? However, such map needs to be collected / maintained separately from existing `Collection` — Nowhere Man, Dec 08 '21 at 08:06
"it's inelegant because all other uses of the object will need to refer to dataObj.alphaAndBeta.alpha instead of dataObj.alpha." → why don't you use getters (and setters)? dataObj.alphaAndBeta.alpha becomes dataObj.getAlpha(), which is very commonly used in Java. In fact, raw field access is usually discouraged in favor of private fields and getter+setter. — Clashsoft, Dec 08 '21 at 10:54

score 1 · Answer 1 · answered Dec 08 '21 at 15:00

Using a Map is the right approach, but you can encapsulate it in a Set implementation having precisely the intended behavior of “a Set with custom equals”.

public class CustomSet<E> extends AbstractSet<E> {
    private final Function<E, Object> theKeyFunction;
    private final HashMap<Object, E> backend = new HashMap<>();

    public CustomSet(Function<E,Object> keyFunction) {
        theKeyFunction = Objects.requireNonNull(keyFunction);
    }

    @Override
    public int size() {
        return backend.size();
    }

    @Override
    public boolean add(E e) {
        Objects.requireNonNull(e);
        return backend.putIfAbsent(theKeyFunction.apply(e), e) == null;
    }

    @Override
    public boolean contains(Object o) {
        if(o == null) return false;
        @SuppressWarnings("unchecked") E e = (E)o;
        Object key;
        try { key = theKeyFunction.apply(e); }
        catch(ClassCastException ex) { return false; }
        return backend.containsKey(key);
    }

    @Override
    public boolean remove(Object o) {
        if(o == null) return false;
        @SuppressWarnings("unchecked") E e = (E)o;
        Object key;
        try { key = theKeyFunction.apply(e); }
        catch(ClassCastException ex) { return false; }
        return backend.remove(key) != null;
    }

    @Override
    public void clear() {
        backend.clear();
    }

    @Override
    public boolean retainAll(Collection<?> c) {
        return backend.values().retainAll(c);
    }

    @Override
    public boolean removeIf(Predicate<? super E> filter) {
        return backend.values().removeIf(filter);
    }

    @Override
    public void forEach(Consumer<? super E> action) {
        backend.values().forEach(action);
    }

    @Override
    public Iterator<E> iterator() {
        return backend.values().iterator();
    }

    @Override
    public Spliterator<E> spliterator() {
        return backend.values().spliterator();
    }

    @Override
    public Object[] toArray() {
        return backend.values().toArray();
    }

    @Override
    public <T> T[] toArray(T[] a) {
        return backend.values().toArray(a);
    }    
}

To keep it simple, this Set does not support null.

This class overrides some methods it doesn’t have to, to provide better performance when iterating or streaming over it. Besides that, it’s rather simple. If you think, “but a Set that internally uses a Map is quite inefficient”, look at the source code of HashSet or TreeSet…

This set implementation can be tested like

record Person(String name, int age) {}

Set<Person> nameSet = new CustomSet<>(Person::name);
Set<Person> ageSet = new CustomSet<>(Person::age);

for(String name: List.of("John", "Paul", "George", "Ringo")) {
    for(int age: new int[] { 20, 24, 27, 31 }) {
        Person p = new Person(name, age);
        if(nameSet.add(p)) System.out.println("added " + p + " to nameSet");
        if(ageSet.add(p)) System.out.println("added " + p + " to ageSet");
    }
}
System.out.println();
System.out.println("nameSet: " + nameSet);
System.out.println("ageSet:  " + ageSet);
System.out.println();
Person p = new Person("Paul", 100);
System.out.println("nameSet contains " + p + "? " + nameSet.contains(p));
System.out.println("ageSet  contains " + p + "? " + ageSet.contains(p));
p = new Person("Bob", 27);
System.out.println("nameSet contains " + p + "? " + nameSet.contains(p));
System.out.println("ageSet  contains " + p + "? " + ageSet.contains(p));

added Person[name=John, age=20] to nameSet
added Person[name=John, age=20] to ageSet
added Person[name=John, age=24] to ageSet
added Person[name=John, age=27] to ageSet
added Person[name=John, age=31] to ageSet
added Person[name=Paul, age=20] to nameSet
added Person[name=George, age=20] to nameSet
added Person[name=Ringo, age=20] to nameSet

nameSet: [Person[name=George, age=20], Person[name=John, age=20], Person[name=Ringo, age=20], Person[name=Paul, age=20]]
ageSet:  [Person[name=John, age=20], Person[name=John, age=24], Person[name=John, age=27], Person[name=John, age=31]]

nameSet contains Person[name=Paul, age=100]?true
ageSet  contains Person[name=Paul, age=100]?false
nameSet contains Person[name=Bob, age=27]?false
ageSet  contains Person[name=Bob, age=27]?true

demonstrating the different understanding of equality of the two sets, which leads to the same warning as applies to TreeSet with comparators not consistent with equals. Mixing sets with different key functions can lead to the same weird behavior as mixing sorted sets with different comparators or mixing such sets with an ordinary hash set.

If the key consists of multiple properties, a dedicated key object is the way to go, but that doesn’t mean that the application domain object has to be a composed object:

record MyData(int alpha, double beta, int foo, double bar) {}

Set<MyData> set = new CustomSet<>(d -> {
    record Key(int alpha, double beta) {}
    return new Key(d.alpha(), d.beta());
});

set.add(new MyData(1, 1.0, 100, 1.23));
System.out.println(set.contains(new MyData(1, 1.0, -1, Double.NaN))); // true

Solutions for older Java versions without record are a bit more verbose, but the principle stays the same. If you don’t need maximum performance, you can also use List keys as they have a working equals and hashCode implementation:

// Java 8 compatible
Set<MyData> set = new CustomSet<>(d -> Arrays.asList(d.alpha(), d.beta()));

score 1 · Answer 2 · answered Dec 10 '21 at 05:47

I promise that I did try searching for an answer before writing this question. But 24 hours later I had a much more productive search, and found several viable answers:

score 0 · Answer 3 · answered Dec 08 '21 at 10:49

Keep a Map of the MyData values where the map key is defined uniquely by alpha and beta. In this example I'm concatenating them as Strings:

class MyDataLookup {
    Map<String, MyData> map = new HashMap<>();

    public MyData put(MyData value) {
        return map.put(getKey(value), value);
    }

    public boolean contains(MyData value) {
        return map.containsKey(getKey(value));
    }

    private static String getKey(MyData value) {
        return value.alpha +"_"+ value.beta;
    }
}

From this you can easily change the definition of containment by modifying MyDataLookup#getKey (say to include foo as well).

A set-like collection with a customizable "equals"

3 Answers3