1

I have a case where I want to remove objects from a list if there are duplicate ids. The items that should then be removed is the one with the oldest date. How can I do this using Java streams in a clean way? I was thinking it should be possible to like group the objects by id first and then sort them by date and only select the first object or similar but I'm struggling on how to do this.

Example:

`

package org.example;

import java.time.LocalDateTime;
import java.util.ArrayList;
import java.util.List;
import java.util.Map;
import java.util.stream.Collectors;

public class Main {

  class Student {
    private String id;
    private LocalDateTime startDatetime;

    public String getId() {
      return id;
    }

    public void setId(String id) {
      this.id = id;
    }

    public LocalDateTime getStartDatetime() {
      return startDatetime;
    }

    public void setStartDatetime(LocalDateTime startDatetime) {
      this.startDatetime = startDatetime;
    }

    public Student(String id, LocalDateTime startDatetime) {
      this.id = id;
      this.startDatetime = startDatetime;
    }
  }

  public static void main(String[] args) {
    new Main();
  }

  public Main() {
    List<Student> students = new ArrayList<>() {
      {
        add(new Student("1", LocalDateTime.now()));
        add(new Student("1", LocalDateTime.of(2000, 02, 01, 01, 01)));
        add(new Student("1", LocalDateTime.of(1990, 02, 01, 01, 01)));
        add(new Student("2", LocalDateTime.of(1990, 02, 01, 01, 01)));
      } };

    //Now this list should be sorted as the following:
    //If two or more student objects have the same id, remove the ones with the oldest startDateTime.
    //Thus, the result above should only contain 2 objects. The first object with id 1 and the LocalDateTime.now() and the second object should be the one with id 2.

    Map<String, List<Student>> groupedStudents =
        students.stream().collect(Collectors.groupingBy(Student::getId));
    
  }
}

`

hejha
  • 13
  • 2

1 Answers1

1

To eliminate the duplicated students (i.e. having the same id) from the list we can use an auxiliary Map.

This Map should associate a single instance of Student (the one with the latest start date) with a particular id. The proper Collector for that purpose is a three-args version of toMap() which expects:

  • a keyMapper, which generates a key from the consumed stream element;
  • a valueMapper generating a value;
  • and a mergeFunction responsible for resolving duplicates.

To implement the mergeFunction we can use static method BinaryOperator.maxBy which expects a Comparator as an argument. And to define a comparator we can make use of the Java 8 Comparator.comparing().

Finally, to generate a list of students having unique id we need to generate a stream over the values of the intermediate Map, apply sorting add collect the elements into a List.

List<Student> students = List.of(
    new Student("1", LocalDateTime.now()),
    new Student("1", LocalDateTime.of(2000, 02, 01, 01, 01)),
    new Student("1", LocalDateTime.of(1990, 02, 01, 01, 01)),
    new Student("2", LocalDateTime.of(1990, 02, 01, 01, 01))
);

List<Student> uniqueStudents = students.stream()
    .collect(Collectors.toMap(
        Student::getId,
        Function.identity(),
        BinaryOperator.maxBy(Comparator.comparing(Student::getStartDatetime))
    ))
    .values().stream()
    .sorted(Comparator.comparing(Student::getStartDatetime))
    .toList(); // for Java 16+ .or collect(Collectors.toList())

Output:

Student{id='2', startDatetime=1990-02-01T01:01}
Student{id='1', startDatetime=2022-11-01T14:03:17.858753}
Alexander Ivanchenko
  • 25,667
  • 5
  • 22
  • 46
  • Thanks! However the result I'm looking for is actually just a list (with only unique students as per the sorting) and not a map. :) – hejha Nov 01 '22 at 10:50
  • @hejha Need a `List` of students having unique id as a result, right? See the update. – Alexander Ivanchenko Nov 01 '22 at 10:55
  • If you don’t require the result to be a *list*, then alternatively `Collection uniqueStudents = students.stream() .collect(Collectors.groupingBy(Student::getId, Collectors.collectingAndThen(Collectors.maxBy(Comparator.comparing(Student::getStartDatetime)), Optional::orElseThrow))) .values();`. Depending on taste. Can probably be modified to return a `List` if required. – Ole V.V. Nov 01 '22 at 12:24
  • @OleV.V. Sure, it can be written using `groupingBy()`, but since here we're producing a Map that associates a *key* with a single *value*, Collector `toMap()` would semantically more suitable in this case. `groupingBy()` is more appropriate when a **group** of *values* needs to be associated with the same *key*. See this [answer by Holger](https://stackoverflow.com/a/57042622/17949945). And from the perspective of readability, one collector is better than three nested collectors. – Alexander Ivanchenko Nov 01 '22 at 12:29
  • 1
    @OleV.V. You omitted sorting. With `.collect(groupingBy...).values()` there's no way to obtain students in sorted order by `startTime`. Alternatively, we can create a custom collector that uses a map is its internal accumulation type and returns a sorted list without creating the second stream, but it would be more complex than using built-in Collectors. – Alexander Ivanchenko Nov 01 '22 at 12:41
  • I didn’t read any requirement in the question that the result should be sorted? The sorting mentioned was only for finding the latest with each ID (which my code does), the way I interpreted it. You are right, of course, a non-list collection generally hasn’t got any built-in order so cannot be a sorted collection. And it may well be that my suggestion is unsuited in case a sorted list is required as end result. – Ole V.V. Nov 01 '22 at 12:49