5

I have a particular requirement where I need to dedupe a list of objects based on a combination of equality criteria.

e.g. Two Student objects are equal if:
1. firstName and id are same OR 2. lastName, class, and emailId are same

I was planning to use a Set to remove duplicates. However, there's a problem:
I can override the equals method but the hashCode method may not return same hash code for two equal objects.

@Override
public boolean equals(Object obj) {
    if (this == obj)
        return true;
    if (obj == null)
        return false;
    if (getClass() != obj.getClass())
        return false;

    Student other = (Student) obj;
    if ((firstName.equals(other.firstName) && id==other.id) ||
            (lastName.equals(other.lastName) && class==other.class && emailId.equals(other.emailId ))
        return true;        
    return false;
}

Now I cannot override hashCode method in a way that it returns same hash codes for two objects that are equal according to this equals method.

Is there a way to dedupe based on multiple equality criteria? I considered using a List and then using the contains method to check if the element is already there, but this increases the complexity as contains runs in O(n) time. I don't want to return the exact same hash codes for all the objects as that's just increases the time and beats the purpose of using hash codes. I've also considered sorting items using a custom comparator, but that again takes at least O(n log n), plus one more walk through to remove the duplicates.

As of now, the best solution I have is to maintain two different sets, one for each condition and use that to build a List, but that takes almost three times the memory. I'm looking for a faster and memory efficient way as I'll be dealing with a large number of records.

Suraj Bajaj
  • 6,630
  • 5
  • 34
  • 49
  • 1
    *"Now I cannot override hashCode method in a way that it returns same hash codes for two objects that are equal according to this equals method"* Can you explain why you say this? – m0skit0 Jul 29 '15 at 13:27
  • See @m0skit0's comment. Your hashcode is wrong, if it doesn't return the same value for equal objects. – Sebastian S Jul 29 '15 at 13:31
  • @m0skit0 That's the general contract of [hashCode](http://docs.oracle.com/javase/7/docs/api/java/lang/Object.html#hashCode%28%29): `If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.` – Suraj Bajaj Jul 29 '15 at 19:03
  • I know hashCode contract, my question is what's the problem on implementing it. – m0skit0 Jul 29 '15 at 19:52
  • @m0skit0 There's an if-else statement in my equals method, so `hashCode` and `equals` will never be in sync. Considering the example, if I use all the five fields to compute `hashCode`, the `equals` will be based only on either two or three of them. So e.g. if `firstName` and `id` are same the objects are equal, but the other three values could be different, and hence the hashCode will also be different for the two objects. You can try to implement, you'll realize what I'm saying. – Suraj Bajaj Jul 29 '15 at 20:25

1 Answers1

0

You can make Student Comparable and use TreeSet. Simple implementation of compareTo may be:

@Override
public int compareTo(Student other) {
    if (this.equals(other)) {
        return 0;
    } else {
        return (this.firstName + this.lastName + emailId + clazz + id)
                .compareTo(other.firstName + other.lastName + other.emailId + clazz + id);
    }
}

Or make your own Set implementation, for instance containing a List of distinct Student objects, checking for equality every time you add a student. This will have O(n) add complexity, so can't be considered a good implementation, but it is simple to write.

class ListSet<T> extends AbstractSet<T> {
    private List<T> list = new ArrayList<T>();

    @Override
    public boolean add(T t) {
        if (list.contains(t)) {
            return false;
        } else {
            return list.add(t);
        }
    }

    @Override
    public Iterator<T> iterator() {
        return list.iterator();
    }

    @Override
    public int size() {
        return list.size();
    }
}
Danil Gaponov
  • 1,413
  • 13
  • 23