2

I want to remove duplicate records from an arraylist based on multiple properties. This is a sample domain object class:

private String mdl;
private String ndc;
private String gpi;
private String labelName;
private int seqNo;
private String vendorName;

The mdl, ndc, gpi, and seqNo together make up a unique record. I want to find duplicates in an arraylist that checks for these 4 properties and then removes the record from the list if a record with the same 4 properties already exists in the list.

posed1940
  • 23
  • 2
  • 6
  • 3
    customize the hashcode and equals method, then store the objects to a Set – Liping Huang Jul 11 '19 at 23:36
  • Could you show an example of that? I've already overridden the hashcode and equals method but how exactly would I go about implementing it to only check for these specific properties? – posed1940 Jul 11 '19 at 23:39
  • 1
    From extensibility point of view I'm wondering if the asker really wants to have `equals`&`hashCode` or maybe it would be enough to have a custom comparator and a collection backed by it. This way "id-equivalence" could be kept away from all-field equals (which might be necessary in other part of application). – Adam Kotwasinski Jul 12 '19 at 00:15
  • It would help also if you posted the code for the overridden equals and hashcode methods. – Edward Jul 12 '19 at 00:34
  • I don't think this is a duplicate of https://stackoverflow.com/questions/2265503/why-do-i-need-to-override-the-equals-and-hashcode-methods-in-java based on OP's selected answer. Seems they want a way to compare without using equals() and hashCode() – Edward Jul 12 '19 at 21:55

2 Answers2

4

.equals() and .hashCode() should be overridden to account for your key: mdl, ndc. gpi, seqNo. There are countless guides to doing this on this site, but something like:

@Override
public boolean equals(Object obj) {
    if(obj != null && obj instanceof MyClass) {
        MyClass o = (MyClass)obj;
        return mdl.equals(o.mdl) && ndc.equals(o.ndc) &&
          gpi.equals(o.gpi) && seqNo == o.seqNo;
    }
    return false;
}

@Override
public int hashCode() {
    return Objects.hash(mdl, ndc, gpi, seqNo);
}

There may be more efficient ways of implementing them if that's a concern.

Then you can just convert your list to a set with:

Set<MyClass> set = new HashSet<>(list);

The resulting set won't have any duplicates and you can now replace your list with the new values list = new ArrayList<>(set); if you need to.

If you want to maintain the order of the items in the original list, instantiate LinkedHashSet instead of HashSet.

Unrelated to your direct question, perhaps consider using a Set instead of List if you want to avoid duplicates in the first place. It will make your code more efficient (less memory usage without the duplicates) and eliminate the need to search for duplicates afterwards.

Edward
  • 580
  • 3
  • 15
  • 1
    OP does not specify he overrode those methods to support the uniqueness logic with those 4 variables in the question, hence this is based on the assumption that it does. – buræquete Jul 12 '19 at 00:21
  • @buræquete Actually, OP did specify in his comment under the question: "I've already overridden the hashcode and equals method" – Edward Jul 12 '19 at 00:23
  • 1
    yes but you can overrode & include a logic that wouldn't make those 4 variables as the unique hash values, right? He might've included other fields? He did not specify that he did so in those methods, if you include in your answer how to do that, maybe then that would be OK (**"but how exactly would I go about implementing it to only check for these specific properties?"**) – buræquete Jul 12 '19 at 00:24
  • 1
    @buræquete I see what you're saying. I interpret "how exactly would I go about implementing it to only check for these specific properties?" as meaning OP wants to know how to eliminate the duplicates based on the equals & hashcode functions, not how to correctly implement equals and hashcode. I suppose it's ambiguous. – Edward Jul 12 '19 at 00:33
  • I would consider this approach inappropriate for a single use case, unless the rules of the object were consistent and allowed for `equals` (and `hashcode`) to always be used against these properties - it's not "wrong", but it may not be "right" either – MadProgrammer Jul 12 '19 at 01:52
  • @MadProgrammer Is that not implied by the question? "sample domain object class" and "The mdl, ndc, gpi, and seqNo together make up a unique record."? But yes, I agree that if by "unique record" OP means just for this task as opposed to the domain object in general. – Edward Jul 12 '19 at 01:58
1

You can try doing the following;

List<Obj> list = ...; // list contains multiple objects
Collection<Obj> nonDuplicateCollection = list.stream()
        .collect(Collectors.toMap(Obj::generateUniqueKey, Function.identity(), (a, b) -> a))
        .values();

(a, b) -> a, means that when two objects are identical, the final map will contain the earlier object, the latter one will be discarded, you can change this behaviour if you'd like the latter one.

where Obj is;

public static class Obj {

    private String mdl;
    private String ndc;
    private String gpi;
    private String labelName;
    private int seqNo;
    private String vendorName;

    // other getter/setters

    public String generateUniqueKey() {
        return mdl + ndc + gpi + seqNo;
    }
}

I'd rather do something like this, than to override hashCode or equals methods, which might be necessary in another logic in their default states... Plus explicitly showing how you are asserting the uniqueness with a proper method like generateUniqueKey is better than hiding that logic in some hashCode method is much better in terms of readability & maintainability.

buræquete
  • 14,226
  • 4
  • 44
  • 89
  • You could do it this way if you didn't want to override equals and hashCode for whatever reason, but OP has already implemented them, which seems like the correct thing to do given the "unique record" definition in the question. In which case just putting the list into a set would be a simpler approach. – Edward Jul 12 '19 at 00:14