I have a particular requirement where I need to dedupe a list of objects based on a combination of equality criteria.
e.g. Two Student
objects are equal if:
1. firstName and id are same OR 2. lastName, class, and emailId are same
I was planning to use a Set
to remove duplicates. However, there's a problem:
I can override the equals
method but the hashCode
method may not return same hash code for two equal objects.
@Override
public boolean equals(Object obj) {
if (this == obj)
return true;
if (obj == null)
return false;
if (getClass() != obj.getClass())
return false;
Student other = (Student) obj;
if ((firstName.equals(other.firstName) && id==other.id) ||
(lastName.equals(other.lastName) && class==other.class && emailId.equals(other.emailId ))
return true;
return false;
}
Now I cannot override hashCode
method in a way that it returns same hash codes for two objects that are equal according to this equals
method.
Is there a way to dedupe based on multiple equality criteria? I considered using a List
and then using the contains
method to check if the element is already there, but this increases the complexity as contains runs in O(n) time. I don't want to return the exact same hash codes for all the objects as that's just increases the time and beats the purpose of using hash codes. I've also considered sorting items using a custom comparator, but that again takes at least O(n log n), plus one more walk through to remove the duplicates.
As of now, the best solution I have is to maintain two different sets, one for each condition and use that to build a List
, but that takes almost three times the memory. I'm looking for a faster and memory efficient way as I'll be dealing with a large number of records.