Using a Map
is the right approach, but you can encapsulate it in a Set
implementation having precisely the intended behavior of “a Set
with custom equals
”.
public class CustomSet<E> extends AbstractSet<E> {
private final Function<E, Object> theKeyFunction;
private final HashMap<Object, E> backend = new HashMap<>();
public CustomSet(Function<E,Object> keyFunction) {
theKeyFunction = Objects.requireNonNull(keyFunction);
}
@Override
public int size() {
return backend.size();
}
@Override
public boolean add(E e) {
Objects.requireNonNull(e);
return backend.putIfAbsent(theKeyFunction.apply(e), e) == null;
}
@Override
public boolean contains(Object o) {
if(o == null) return false;
@SuppressWarnings("unchecked") E e = (E)o;
Object key;
try { key = theKeyFunction.apply(e); }
catch(ClassCastException ex) { return false; }
return backend.containsKey(key);
}
@Override
public boolean remove(Object o) {
if(o == null) return false;
@SuppressWarnings("unchecked") E e = (E)o;
Object key;
try { key = theKeyFunction.apply(e); }
catch(ClassCastException ex) { return false; }
return backend.remove(key) != null;
}
@Override
public void clear() {
backend.clear();
}
@Override
public boolean retainAll(Collection<?> c) {
return backend.values().retainAll(c);
}
@Override
public boolean removeIf(Predicate<? super E> filter) {
return backend.values().removeIf(filter);
}
@Override
public void forEach(Consumer<? super E> action) {
backend.values().forEach(action);
}
@Override
public Iterator<E> iterator() {
return backend.values().iterator();
}
@Override
public Spliterator<E> spliterator() {
return backend.values().spliterator();
}
@Override
public Object[] toArray() {
return backend.values().toArray();
}
@Override
public <T> T[] toArray(T[] a) {
return backend.values().toArray(a);
}
}
To keep it simple, this Set
does not support null
.
This class overrides some methods it doesn’t have to, to provide better performance when iterating or streaming over it. Besides that, it’s rather simple. If you think, “but a Set
that internally uses a Map
is quite inefficient”, look at the source code of HashSet
or TreeSet
…
This set implementation can be tested like
record Person(String name, int age) {}
Set<Person> nameSet = new CustomSet<>(Person::name);
Set<Person> ageSet = new CustomSet<>(Person::age);
for(String name: List.of("John", "Paul", "George", "Ringo")) {
for(int age: new int[] { 20, 24, 27, 31 }) {
Person p = new Person(name, age);
if(nameSet.add(p)) System.out.println("added " + p + " to nameSet");
if(ageSet.add(p)) System.out.println("added " + p + " to ageSet");
}
}
System.out.println();
System.out.println("nameSet: " + nameSet);
System.out.println("ageSet: " + ageSet);
System.out.println();
Person p = new Person("Paul", 100);
System.out.println("nameSet contains " + p + "? " + nameSet.contains(p));
System.out.println("ageSet contains " + p + "? " + ageSet.contains(p));
p = new Person("Bob", 27);
System.out.println("nameSet contains " + p + "? " + nameSet.contains(p));
System.out.println("ageSet contains " + p + "? " + ageSet.contains(p));
added Person[name=John, age=20] to nameSet
added Person[name=John, age=20] to ageSet
added Person[name=John, age=24] to ageSet
added Person[name=John, age=27] to ageSet
added Person[name=John, age=31] to ageSet
added Person[name=Paul, age=20] to nameSet
added Person[name=George, age=20] to nameSet
added Person[name=Ringo, age=20] to nameSet
nameSet: [Person[name=George, age=20], Person[name=John, age=20], Person[name=Ringo, age=20], Person[name=Paul, age=20]]
ageSet: [Person[name=John, age=20], Person[name=John, age=24], Person[name=John, age=27], Person[name=John, age=31]]
nameSet contains Person[name=Paul, age=100]?true
ageSet contains Person[name=Paul, age=100]?false
nameSet contains Person[name=Bob, age=27]?false
ageSet contains Person[name=Bob, age=27]?true
demonstrating the different understanding of equality of the two sets, which leads to the same warning as applies to TreeSet
with comparators not consistent with equals. Mixing sets with different key functions can lead to the same weird behavior as mixing sorted sets with different comparators or mixing such sets with an ordinary hash set.
If the key consists of multiple properties, a dedicated key object is the way to go, but that doesn’t mean that the application domain object has to be a composed object:
record MyData(int alpha, double beta, int foo, double bar) {}
Set<MyData> set = new CustomSet<>(d -> {
record Key(int alpha, double beta) {}
return new Key(d.alpha(), d.beta());
});
set.add(new MyData(1, 1.0, 100, 1.23));
System.out.println(set.contains(new MyData(1, 1.0, -1, Double.NaN))); // true
Solutions for older Java versions without record
are a bit more verbose, but the principle stays the same. If you don’t need maximum performance, you can also use List
keys as they have a working equals
and hashCode
implementation:
// Java 8 compatible
Set<MyData> set = new CustomSet<>(d -> Arrays.asList(d.alpha(), d.beta()));