118

I am trying to remove duplicates from a List of objects based on some property.

can we do it in a simple way using java 8

List<Employee> employee

Can we remove duplicates from it based on id property of employee. I have seen posts removing duplicate strings form arraylist of string.

Patan
  • 17,073
  • 36
  • 124
  • 198

9 Answers9

192

You can get a stream from the List and put in in the TreeSet from which you provide a custom comparator that compares id uniquely.

Then if you really need a list you can put then back this collection into an ArrayList.

import static java.util.Comparator.comparingInt;
import static java.util.stream.Collectors.collectingAndThen;
import static java.util.stream.Collectors.toCollection;

...
List<Employee> unique = employee.stream()
                                .collect(collectingAndThen(toCollection(() -> new TreeSet<>(comparingInt(Employee::getId))),
                                                           ArrayList::new));

Given the example:

List<Employee> employee = Arrays.asList(new Employee(1, "John"), new Employee(1, "Bob"), new Employee(2, "Alice"));

It will output:

[Employee{id=1, name='John'}, Employee{id=2, name='Alice'}]

Another idea could be to use a wrapper that wraps an employee and have the equals and hashcode method based with its id:

class WrapperEmployee {
    private Employee e;

    public WrapperEmployee(Employee e) {
        this.e = e;
    }

    public Employee unwrap() {
        return this.e;
    }

    @Override
    public boolean equals(Object o) {
        if (this == o) return true;
        if (o == null || getClass() != o.getClass()) return false;
        WrapperEmployee that = (WrapperEmployee) o;
        return Objects.equals(e.getId(), that.e.getId());
    }

    @Override
    public int hashCode() {
        return Objects.hash(e.getId());
    }
}

Then you wrap each instance, call distinct(), unwrap them and collect the result in a list.

List<Employee> unique = employee.stream()
                                .map(WrapperEmployee::new)
                                .distinct()
                                .map(WrapperEmployee::unwrap)
                                .collect(Collectors.toList());

In fact, I think you can make this wrapper generic by providing a function that will do the comparison:

public class Wrapper<T, U> {
    private T t;
    private Function<T, U> equalityFunction;

    public Wrapper(T t, Function<T, U> equalityFunction) {
        this.t = t;
        this.equalityFunction = equalityFunction;
    }

    public T unwrap() {
        return this.t;
    }

    @Override
    public boolean equals(Object o) {
        if (this == o) return true;
        if (o == null || getClass() != o.getClass()) return false;
        @SuppressWarnings("unchecked")
        Wrapper<T, U> that = (Wrapper<T, U>) o;
        return Objects.equals(equalityFunction.apply(this.t), that.equalityFunction.apply(that.t));
    }

    @Override
    public int hashCode() {
        return Objects.hash(equalityFunction.apply(this.t));
    }
}

and the mapping will be:

.map(e -> new Wrapper<>(e, Employee::getId))
shareef
  • 9,255
  • 13
  • 58
  • 89
Alexis C.
  • 91,686
  • 21
  • 171
  • 177
  • 8
    Your first suggestion is a far better answer than the wrapper one :). Wrapper is obvious, but the first one is much better. I wasnt aware of collectingAndThen – Jatin Apr 16 '15 at 11:21
  • I implemented this way but some time I get null pointer exception when executing the test cases. Strange thing is that, when I re execute the test cases, the error goes away. Do you have any idea why this happens. It would be great if you can help me on this. – Patan Apr 20 '15 at 09:50
  • 1
    @Patan Hard to tell without the test cases, check if you don't have any null reference in the list. – Alexis C. Apr 21 '15 at 07:25
  • 1
    The first example works for me but I don't understand why :) – Bevor Jan 03 '17 at 16:57
  • 1
    @Bevor It uses the set constructor that takes a comparator as parameter. In the example, every employe with the same id will be considered equals and will be unique in the resulting set. – Alexis C. Jan 03 '17 at 18:06
  • for older Java? On Android many of these works only from API 24 – user25 Sep 09 '18 at 13:55
  • Made it for Android API 14: `TreeSet – user25 Sep 09 '18 at 14:18
  • `collectingAndThen` - interesting thing which will allow to write such algorithm just in one line. Though I'm not sure if it's really that great - we get quite long line why just don't write two or more lines? – user25 Sep 09 '18 at 14:20
  • This solution is `O(NlogN)` just to remove duplicates, because it uses `TreeSet` (and `TreeSet.add` operation is `O(logN)`). – fps Oct 10 '18 at 20:28
  • This example is weird, IntelliJ IDEA won't suggest to add imports automatically. Seems it's not a good idea to use `import static java.util.Comparator.comparingInt; import static java.util.stream.Collectors.collectingAndThen; import static java.util.stream.Collectors.toCollection; ` – user25 Jan 21 '19 at 21:57
  • 1
    In your first answer, instead of using `collectingAndThen`, would it be more efficient to just wrap the solution using the `new ArrayList<(ArrayList(Collection extends E> c)` constructor? So the final code would be `List unique = new ArrayList<>(employee.stream().collect(toCollection(() -> new TreeSet<>(comparingInt(Employee::getId)))));` – cbender Jan 27 '19 at 05:47
  • @cbender I want to use `List unique = employee.stream() .collect(collectingAndThen(toCollection(() -> new TreeSet<>(comparingInt(Employee::getId))), ArrayList::new));` But with two fields. Here only one field given, `getId`. How can I check two fields to remove duplicate. – Avijit Barua Mar 24 '19 at 13:01
  • 4
    @AvijitBarua you can compare as many fields as you want. The `TreeSet` constructor will accept any `Comparator`. In Java 8 and onward the `comparingInt` method is just a quick way to create a Comparator that compares `int` fields. If you want to add another field to the comparison you can use the `thenComparing` chained to the original compare so it would look something like `comparingInt(Employee::getId).thenComparing(Employee::getName)`. This seems like a good article explaining Comparators - https://www.baeldung.com/java-8-comparator-comparing. – cbender Mar 24 '19 at 14:07
  • How to get the last updated value like new Employee(1, "Bob")? – Jeff Cook Feb 16 '22 at 13:31
104

The easiest way to do it directly in the list is

HashSet<Object> seen = new HashSet<>();
employee.removeIf(e -> !seen.add(e.getID()));
  • removeIf will remove an element if it meets the specified criteria
  • Set.add will return false if it did not modify the Set, i.e. already contains the value
  • combining these two, it will remove all elements (employees) whose id has been encountered before

Of course, it only works if the list supports removal of elements.

RubioRic
  • 2,442
  • 4
  • 28
  • 35
Holger
  • 285,553
  • 42
  • 434
  • 765
  • You assume that id is unique, what if I have composite key – Kamil Nękanowicz Mar 23 '17 at 07:54
  • 3
    @user3871754: you need an object holding the composite key and having appropriate `equals` and `hashCode` implementations, e.g. `yourList.removeIf(e -> !seen.add(Arrays.asList(e.getFirstKeyPart(), e.getSecondKeyPart()));`. Composing the key via `Arrays.asList` works with an arbitrary number of components, whereas for small numbers of components a dedicated key type might be more efficient. – Holger Mar 23 '17 at 09:42
  • what do you mean all? I need to left at least one – user25 Sep 09 '18 at 13:57
  • 4
    Great answer! This worked better for me than the accepted answer, even though both are good! – OmNiOwNeR Oct 13 '18 at 21:14
  • Great solution, works for me. – VXp Jan 11 '21 at 12:38
63

If you can make use of equals, then filter the list by using distinct within a stream (see answers above). If you can not or don't want to override the equals method, you can filter the stream in the following way for any property, e.g. for the property Name (the same for the property Id etc.):

Set<String> nameSet = new HashSet<>();
List<Employee> employeesDistinctByName = employees.stream()
            .filter(e -> nameSet.add(e.getName()))
            .collect(Collectors.toList());
Rolch2015
  • 1,354
  • 1
  • 14
  • 20
  • 1
    This was pretty fine, it takes advantage of the simple functionality of filter that decide to filter or maintain every element based on a predicate (predicate to apply to each element to determine if it should be included), based on the property (String type) insertion in a set : true if newly inserted, false if it exists already...that was smart ! work great for me ! – Shessuky Jul 18 '18 at 10:24
  • 1
    This example is good and simple. – Nathani Software Dec 04 '19 at 07:58
  • Does it work fine in multi-threaded scenarios / parallel streams? I mean, is it thread safe kinda thing? – Arun Gowda Oct 23 '20 at 16:48
  • This is nice solution to remove duplicate items from the list. But my question was to get item form 2 list whose ids are not same. – Masi Boo Dec 04 '20 at 14:44
  • wow! nice and simple. – logbasex Jun 25 '21 at 07:09
23

Another solution is to use a Predicate, then you can use this in any filter:

public static <T> Predicate<T> distinctBy(Function<? super T, ?> f) {
  Set<Object> objects = new ConcurrentHashSet<>();
  return t -> objects.add(f.apply(t));
}

Then simply reuse the predicate anywhere:

employees.stream().filter(distinctBy(e -> e.getId));

Note: in the JavaDoc of filter, which says it takes a stateless Predicte. Actually, this works fine even if the stream is parallel.


About other solutions:

1) Using .collect(Collectors.toConcurrentMap(..)).values() is a good solution, but it's annoying if you want to sort and keep the order.

2) stream.removeIf(e->!seen.add(e.getID())); is also another very good solution. But we need to make sure the collection implemented removeIf, for example it will throw exception if we construct the collection use Arrays.asList(..).

navins
  • 3,429
  • 2
  • 28
  • 29
18

Try this code:

Collection<Employee> nonDuplicatedEmployees = employees.stream()
   .<Map<Integer, Employee>> collect(HashMap::new,(m,e)->m.put(e.getId(), e), Map::putAll)
   .values();
Tho
  • 23,158
  • 6
  • 60
  • 47
13

This worked for me:

list.stream().distinct().collect(Collectors.toList());

You need to implement equals, of course

Sebastian D'Agostino
  • 1,575
  • 2
  • 27
  • 44
11

If order does not matter and when it's more performant to run in parallel, Collect to a Map and then get values:

employee.stream().collect(Collectors.toConcurrentMap(Employee::getId, Function.identity(), (p, q) -> p)).values()
Xiao Liu
  • 450
  • 4
  • 12
  • 2
    So, I guess something like this if you want a list back: `employee.stream().collect(Collectors.toConcurrentMap(Employee::getId, Function.identity(), (p, q) -> p)).values().stream().collect(Collectors.toList())`. And, regarding parallel, you can you use it or not here - I mean parallelStream API? – Rok T. Apr 14 '20 at 11:30
  • 1
    @RokT.. no need to re-create a stream, just wrap it in an ArrayList. ex:- new ArrayList<>(.stream().collect()......values()); – Tharindu Eranga Feb 05 '21 at 04:11
2

There are a lot of good answers here but I didn't find the one about using reduce method. So for your case, you can apply it in following way:

 List<Employee> employeeList = employees.stream()
      .reduce(new ArrayList<>(), (List<Employee> accumulator, Employee employee) ->
      {
        if (accumulator.stream().noneMatch(emp -> emp.getId().equals(employee.getId())))
        {
          accumulator.add(employee);
        }
        return accumulator;
      }, (acc1, acc2) ->
      {
        acc1.addAll(acc2);
        return acc1;
      });
Alex
  • 1,940
  • 2
  • 18
  • 36
  • working with parallel Streams there's a chance that the combiner will add together employees with the same id again.. in that case you need to check there aswell for duplicates. – Sven Dhaens Aug 23 '17 at 15:50
0

Another version which is simple

BiFunction<TreeSet<Employee>,List<Employee> ,TreeSet<Employee>> appendTree = (y,x) -> (y.addAll(x))? y:y;

TreeSet<Employee> outputList = appendTree.apply(new TreeSet<Employee>(Comparator.comparing(p->p.getId())),personList);
zawhtut
  • 8,335
  • 5
  • 52
  • 76
  • 4
    This is an obfuscated version of `TreeSet outputList = new TreeSet<>(Comparator.comparing(p->p.getId())); outputList.addAll(personList);` The straight-forward code is much simpler. – Holger Jul 07 '17 at 06:21