I would like to remove any values from that stream beforehand.
As @JimGarrison has pointed out, preprocessing the data doesn't make sense.
You can't know it in advance whether a name is unique or not until the all data set has been processed.
Another thing that you have to consider that inside the stream pipeline (before the collector) you have knowledge on what data has been encountered previously. Because results of intermediate operations should not depend on any state.
In case if you are thinking that streams are acting like a sequence of loops and therefore assuming that it's possible to preprocess stream elements before collecting them, that's not correct. Elements of the stream pipeline are being processed lazily one at a time. I.e. all the operations in the pipeline will get applied on a single element and each operation will be applied only if it's needed (that's what laziness means).
For more information, have a look at this tutorial and API documentation
Implementations
You can segregate unique values and duplicates in a single stream statement by utilizing Collectors.teeing()
and a custom object that will contain separate collections of duplicated and unique entries of the phone book.
Since the primarily function of this object only to carry the data I've implemented it as Java 16 record.
public record FilteredPhoneBook(Map<String, String> uniquePersonsAddressByName,
List<String> duplicatedNames) {}
Collector teeing()
expects three arguments: two collectors and a function that merges the results produced by both collectors.
The map generated by the groupingBy()
in conjunction with counting()
, is meant to determine duplicated names.
Since there's no point to processing the data, toMap()
which is used as the second collector will create a map containing all names.
When both collectors will hand out their results to the merger
function, it will take care of removing the duplicates.
public static FilteredPhoneBook getFilteredPhoneBook(Collection<Person> people) {
return people.stream()
.collect(Collectors.teeing(
Collectors.groupingBy(Person::getName, Collectors.counting()), // intermediate Map<String, Long>
Collectors.toMap( // intermediate Map<String, String>
Person::getName,
Person::getAddress,
(left, right) -> left),
(Map<String, Long> countByName, Map<String, String> addressByName) -> {
countByName.values().removeIf(count -> count == 1); // removing unique names
addressByName.keySet().removeAll(countByName.keySet()); // removing all duplicates
return new FilteredPhoneBook(addressByName, new ArrayList<>(countByName.keySet()));
}
));
}
Another way to address this problem to utilize Map<String,Boolean>
as the mean of discovering duplicates, as @Holger have suggested.
With the first collector will be written using toMap()
. And it will associate true
with a key that has been encountered only once, and its mergeFunction
will assign the value of false
if at least one duplicate was found.
The rest logic remains the same.
public static FilteredPhoneBook getFilteredPhoneBook(Collection<Person> people) {
return people.stream()
.collect(Collectors.teeing(
Collectors.toMap( // intermediate Map<String, Boolean>
Person::getName,
person -> true, // not proved to be a duplicate and initially considered unique
(left, right) -> false), // is a duplicate
Collectors.toMap( // intermediate Map<String, String>
Person::getName,
Person::getAddress,
(left, right) -> left),
(Map<String, Boolean> isUniqueByName, Map<String, String> addressByName) -> {
isUniqueByName.values().removeIf(Boolean::booleanValue); // removing unique names
addressByName.keySet().removeAll(isUniqueByName.keySet()); // removing all duplicates
return new FilteredPhoneBook(addressByName, new ArrayList<>(isUniqueByName.keySet()));
}
));
}
main()
- demo
public static void main(String[] args) {
List<Person> people = List.of(
new Person("Alise", "address1"),
new Person("Bob", "address2"),
new Person("Bob", "address3"),
new Person("Carol", "address4"),
new Person("Bob", "address5")
);
FilteredPhoneBook filteredPhoneBook = getFilteredPhoneBook(people);
System.out.println("Unique entries:");
filteredPhoneBook.uniquePersonsAddressByName.forEach((k, v) -> System.out.println(k + " : " + v));
System.out.println("\nDuplicates:");
filteredPhoneBook.duplicatedNames().forEach(System.out::println);
}
Output
Unique entries:
Alise : address1
Carol : address4
Duplicates:
Bob