Elegant way to flatMap Set of Sets inside groupingBy

Question

So I have a piece of code where I'm iterating over a list of data. Each one is a ReportData that contains a case with a Long caseId and one Ruling. Each Ruling has one or more Payment. I want to have a Map with the caseId as keys and sets of payments as values (i.e. a Map<Long, Set<Payments>>).

Cases are not unique across rows, but cases are.

In other words, I can have several rows with the same case, but they will have unique rulings.

The following code gets me a Map<Long, Set<Set<Payments>>> which is almost what I want, but I've been struggling to find the correct way to flatMap the final set in the given context. I've been doing workarounds to make the logic work correctly using this map as is, but I'd very much like to fix the algorithm to correctly combine the set of payments into one single set instead of creating a set of sets.

I've searched around and couldn't find a problem with the same kind of iteration, although flatMapping with Java streams seems like a somewhat popular topic.

rowData.stream()
        .collect(Collectors.groupingBy(
            r -> r.case.getCaseId(),
            Collectors.mapping(
                r -> r.getRuling(),
                Collectors.mapping(ruling->
                    ruling.getPayments(),
                    Collectors.toSet()
                )
            )));

No, they are not unique. The input data contains duplicate cases but they all have unique rulings for that specific case. So you could argue the input data is not ideal :P — Lars Holdaas, Sep 28 '18 at 08:20
Could you provide the code of the involved classes? (`ReportData`, `Ruling`, `Payment`) This will make very clear what each one is. Also, the code you provided clearly does not compile: there is a `case` field that is referenced (reserved keyword), and it seems `r.getRulings()` should be `r.getRuling()`. — Didier L, Sep 28 '18 at 08:45
@DidierL you're right, I edited getRulings() to getRuling(). As for the case, I just quickly translated the field names (we don't use english for field names etc. in this project). You're right, it is a reserved keyword and that wouldn't be allowed. — Lars Holdaas, Sep 28 '18 at 08:49
Next time, you should provide a [mcve]. Here I had to guess all your class structures from what I understood of your question to be able to work in my IDE. I think @marstran just wrote the answer inline but thus he was certainly not able to validate it. — Didier L, Sep 28 '18 at 09:55

Ousmane D. · Accepted Answer · 2018-09-28T09:58:01.520

5

Another JDK8 solution:

Map<Long, Set<Payment>> resultSet = 
         rowData.stream()
                .collect(Collectors.toMap(p -> p.Case.getCaseId(),
                        p -> new HashSet<>(p.getRuling().getPayments()),
                        (l, r) -> { l.addAll(r);return l;}));

or as of JDK9 you can use the flatMapping collector:

rowData.stream()
       .collect(Collectors.groupingBy(r -> r.Case.getCaseId(), 
              Collectors.flatMapping(e -> e.getRuling().getPayments().stream(), 
                        Collectors.toSet())));

edited Sep 28 '18 at 09:58

answered Sep 28 '18 at 09:06

Ousmane D.

54,915
8
91
126

@DidierL yup true, given the previous edit. a bit rusty with Java lately. thanks. – Ousmane D. Sep 28 '18 at 09:52
You should keep the `new HashSet()` wrapper however, otherwise you are modifying the ruling's payments in the source collection. – Didier L Sep 28 '18 at 09:54
@DidierL Right, will keep it then. ;-) – Ousmane D. Sep 28 '18 at 09:58
1

This seems to be the way to go – Lars Holdaas Sep 28 '18 at 09:59
2

…and if you are still forced to use Java 8, you’ll find a Java 8 compatible implementation of the `flatMapping` collector at the end of [this answer](https://stackoverflow.com/a/39131049/2711488). Since it has the same signature, it is easy to migrate from that solution to the standard API version in the future. – Holger Sep 28 '18 at 11:12

score 2 · Answer 2 · answered Sep 28 '18 at 09:18

The cleanest solution is to define your own collector:

Map<Long, Set<Payment>> result = rowData.stream()
        .collect(Collectors.groupingBy(
                ReportData::getCaseId,
                Collector.of(HashSet::new,
                        (s, r) -> s.addAll(r.getRuling().getPayments()),
                        (s1, s2) -> { s1.addAll(s2); return s1; })
        ));

Two other solutions to which I thought first but are actually less efficient and readable, but still avoid constructing the intermediate Map:

Merging the inner sets using Collectors.reducing():

Map<Long, Set<Payment>> result = rowData.stream()
        .collect(Collectors.groupingBy(
                ReportData::getCaseId,
                Collectors.reducing(Collections.emptySet(),
                        r -> r.getRuling().getPayments(),
                        (s1, s2) -> {
                            Set<Payment> r = new HashSet<>(s1);
                            r.addAll(s2);
                            return r;
                        })
        ));

where the reducing operation will merge the Set<Payment> of entries with the same caseId. This can however cause a lot of copies of the sets if you have a lot of merges needed.

Another solution is with a downstream collector that flatmaps the nested collections:

Map<Long, Set<Payment>> result = rowData.stream()
        .collect(Collectors.groupingBy(
                ReportData::getCaseId,
                Collectors.collectingAndThen(
                        Collectors.mapping(r -> r.getRuling().getPayments(), Collectors.toList()),
                        s -> s.stream().flatMap(Set::stream).collect(Collectors.toSet())))
        );

Basically it puts all sets of matching caseId together in a List, then flatmaps that list into a single Set.

marstran · Answer 3 · 2018-09-28T08:40:24.127

1

There are probably better ways to do this, but this is the best I found:

 Map<Long, Set<Payment>> result =
            rowData.stream()
                    // First group by caseIds.
                    .collect(Collectors.groupingBy(r -> r.case.getCaseId()))
                    .entrySet().stream()
                    // By streaming over the entrySet, I map the values to the set of payments.
                    .collect(Collectors.toMap(
                            Map.Entry::getKey,
                            entry -> entry.getValue().stream()
                                    .flatMap(r -> r.getRuling().getPayments().stream())
                                    .collect(Collectors.toSet())));

edited Sep 28 '18 at 08:40

answered Sep 28 '18 at 08:34

marstran

26,413
5
61
67

Oh ops I realized I gave wrong information now. There's only one Ruling per row actually. I messed up because the input data itself contains duplicates of cases but only one case and one ruling per dataRow. So sorry! – Lars Holdaas Sep 28 '18 at 08:38
1

@LarsHoldaas Ok, I edited the answer to only have one ruling per case. The idea should be the same, right? – marstran Sep 28 '18 at 08:45
Oh yeah! This seems to work like a charm! Thanks a lot :D – Lars Holdaas Sep 28 '18 at 08:45
1

Look at @Aomine's answer for a better solution if you're on Java 9. By using the `flatMapping` collector, you no longer need the immediate map from `groupingBy`- – marstran Sep 28 '18 at 09:27

Elegant way to flatMap Set of Sets inside groupingBy

3 Answers3