16

I need to generate a unique friend list from a list that can have duplicates by merging all duplicate entries into one object
Example - Friends are fetched from different social feeds and put into 1 big list
1. Friend - [name: "Johnny Depp", dob: "1970-11-10", source: "FB", fbAttribute: ".."]
2. Friend - [name: "Christian Bale", dob: "1970-01-01", source: "LI", liAttribute: ".."]
3. Friend - [name: "Johnny Depp", dob: "1970-11-10", source: "Twitter", twitterAttribute: ".."]
4. Friend - [name: "Johnny Depp", dob: "1970-11-10", source: "LinkedIn", liAttribute: ".."]
5. Friend - [name: "Christian Bale", dob: "1970-01-01", source: "LI", liAttribute: ".."]

Expected output
1. Friend - [name: "Christian Bale", dob: "1970-01-01", liAttribute: "..", fbAttribute: "..", twitterAttribute: ".."]
2. Friend - [name: "Johnny Depp", dob: "1970-11-10", liAttribute: "..", fbAttribute: "..", twitterAttribute: ".."]

Question - How can i merge without using any intermediate container? I can easily use an intermediate map and apply reduce on each value of the entry.

List<Friend> friends;
Map<String, List<Friend>> uniqueFriendMap
    = friends.stream().groupingBy(Friend::uniqueFunction);
List<Friend> mergedFriends = uniqueFriendMap.entrySet()
    .stream()
    .map(entry -> {
           return entry.getValue()
                .stream()
                .reduce((a,b) -> friendMergeFunction(a,b));
    })
    .filter(mergedPlace -> mergedPlace.isPresent())
    .collect(Collectors.toList());

I like to do this without using the intermediate Map uniqueFriendMap. Any suggestions?

Hai Hoang
  • 1,207
  • 2
  • 18
  • 35
Vku
  • 163
  • 1
  • 1
  • 7

1 Answers1

19

The groupingBy operation (or something similar) is unavoidable, the Map created by the operation is also used during the operation for looking up the grouping keys and finding the duplicates. But you can combine it with the reduction of the group elements:

Map<String, Friend> uniqueFriendMap = friends.stream()
    .collect(Collectors.groupingBy(Friend::uniqueFunction,
        Collectors.collectingAndThen(
            Collectors.reducing((a,b) -> friendMergeFunction(a,b)), Optional::get)));

The values of the map are already the resulting distinct friends. If you really need a List, you can create it with a plain Collection operation:

List<Friend> mergedFriends = new ArrayList<>(uniqueFriendMap.values());

If this second operation still annoys you, you can hide it within the collect operation:

List<Friend> mergedFriends = friends.stream()
    .collect(Collectors.collectingAndThen(
        Collectors.groupingBy(Friend::uniqueFunction, Collectors.collectingAndThen(
            Collectors.reducing((a,b) -> friendMergeFunction(a,b)), Optional::get)),
        m -> new ArrayList<>(m.values())));

Since the nested collector represents a Reduction (see also this answer), we can use toMap instead:

List<Friend> mergedFriends = friends.stream()
    .collect(Collectors.collectingAndThen(
        Collectors.toMap(Friend::uniqueFunction, Function.identity(),
            (a,b) -> friendMergeFunction(a,b)),
        m -> new ArrayList<>(m.values())));

Depending on whether friendMergeFunction is a static method or instance method, you may replace (a,b) -> friendMergeFunction(a,b) with DeclaringClass::friendMergeFunction or this::friendMergeFunction.


But note that even within your original approach, several simplifications are possible. When you only process the values of a Map, you don’t need to use the entrySet(), which requires you to call getValue() on each entry. You can process the values() in the first place. Then, you don’t need the verbose input -> { return expression; } syntax, as input -> expression is sufficient. Since the groups of the preceding grouping operation can not be empty, the filter step is obsolete. So your original approach would look like:

Map<String, List<Friend>> uniqueFriendMap
    = friends.stream().collect(Collectors.groupingBy(Friend::uniqueFunction));
List<Friend> mergedFriends = uniqueFriendMap.values().stream()
    .map(group -> group.stream().reduce((a,b) -> friendMergeFunction(a,b)).get())
    .collect(Collectors.toList());

which is not so bad. As said, the fused operation doesn’t skip the Map creation as that’s unavoidable. It only skips the creations of the Lists representing each group, as it will reduce them to a single Friend in-place.

Holger
  • 285,553
  • 42
  • 434
  • 765
  • 1
    Hi Holger, maybe you could replace `Collectors.groupingBy` with a `Collectors.toMap`, something like this `Map uniqueFriendMap = friends.stream().collect(Collectors.toMap(Friend::uniqueFunction, Function.identity(), this::friendMergeFunction));`, it is much simpler don't you think ? – Nicolas Filotto May 26 '17 at 16:41
  • In regards to Nicolas's comment I believe that the uniqueFunction could produce collisions on the hash which would force you to write an equals method that does not really represent the full object. – Dave Aug 29 '18 at 14:43
  • 1
    @Dave I don’t know what you mean with “uniqueFunction could produce collisions on the hash”. That function returns strings and merging those `Friend` instances for which the uniqueFunction returns the same string, as the task of this question. – Holger Aug 29 '18 at 14:49
  • @Holger If the Friend instances that you are merging have different sources but you still want to merge them than you need to write hashcode and equals methods that do not represent the full object. Functional in this case, but not a standard thing to do. – Dave Sep 06 '18 at 18:42
  • 1
    @Dave It seems, you are confusing things here. For a `HashMap`, only the hash code and equals methods of the **keys** are relevant and, as already said in my previous comment, here, the keys are the `String` instances returned by the method `uniqueFunction`. There is no “full object” used as key, just a single property. What will be merged, are the **values** of the map, whose equals or hash code implementation is entirely irrelevant. And after you haven’t explained your term “produce collisions on the hash”, you introduced another one, to “have different sources”… – Holger Sep 06 '18 at 19:17
  • @Holger in regards to "have different sources" 'source' is one of the attributes of the Friend object. I agree that writing a uniqueFunction that returns a string and using that as the key should avoid collisions. I don't recall why my situation required the full object. I agree that your solution solves this problem. – Dave Nov 02 '18 at 13:44