23

I have few classes like below

class Pojo {
    List<Item> items;
}

class Item {
    T key1;
    List<SubItem> subItems;
}

class SubItem {
    V key2;
    Object otherAttribute1;
}

I want to aggregate the items based on key1 and for each aggregation, subitems should be aggregated by key2 in following way:

Map<T, Map<V, List<Subitem>>

How is this possible with Java 8 Collectors.groupingBy nesting?

I was trying something and stuck halfway at

pojo.getItems()
    .stream()
    .collect(
        Collectors.groupingBy(Item::getKey1, /* How to group by here SubItem::getKey2*/)
    );

Note: This not same as cascaded groupingBy which does multilevel aggregation based on fields in the same object as discussed here

Community
  • 1
  • 1
Sandesh Kumar
  • 233
  • 1
  • 2
  • 8
  • Can you try?`pojo.getItems().stream().collect(Collectors.groupingBy(Item::getKey1)).entrySet().stream() .collect(Collectors.toMap(Entry::getKey, (e)-> e.getValue().stream().map(it -> it.getSubItems().stream().collect(Collectors.groupingBy(SubItem::getKey2)))));` Im not sure about it. – David Pérez Cabrera Aug 24 '16 at 18:52
  • 3
    @David Pérez Cabrera: I think that’ll work, but fully collecting everything into a temporary `Map>` can be quite expensive. – Holger Aug 24 '16 at 18:58

1 Answers1

25

You can’t group a single item by multiple keys, unless you accept the item to potentially appear in multiple groups. In that case, you want to perform a kind of flatMap operation.

One way to achieve this, is to use Stream.flatMap with a temporary pair holding the combinations of Item and SubItem before collecting. Due to the absence of a standard pair type, a typical solution is to use Map.Entry for that:

Map<T, Map<V, List<SubItem>>> result = pojo.getItems().stream()
    .flatMap(item -> item.subItems.stream()
        .map(sub -> new AbstractMap.SimpleImmutableEntry<>(item.getKey1(), sub)))
    .collect(Collectors.groupingBy(AbstractMap.SimpleImmutableEntry::getKey,
                Collectors.mapping(Map.Entry::getValue,
                    Collectors.groupingBy(SubItem::getKey2))));

An alternative not requiring these temporary objects would be performing the flatMap operation right in the collector, but unfortunately, flatMapping won't be there until Java 9.

With that, the solution would look like

Map<T, Map<V, List<SubItem>>> result = pojo.getItems().stream()
    .collect(Collectors.groupingBy(Item::getKey1,
                Collectors.flatMapping(item -> item.getSubItems().stream(),
                    Collectors.groupingBy(SubItem::getKey2))));

and if we don’t want to wait for Java 9 for that, we may add a similar collector to our code base, as it’s not so hard to implement:

static <T,U,A,R> Collector<T,?,R> flatMapping(
    Function<? super T,? extends Stream<? extends U>> mapper,
    Collector<? super U,A,R> downstream) {

    BiConsumer<A, ? super U> acc = downstream.accumulator();
    return Collector.of(downstream.supplier(),
        (a, t) -> { try(Stream<? extends U> s=mapper.apply(t)) {
            if(s!=null) s.forEachOrdered(u -> acc.accept(a, u));
        }},
        downstream.combiner(), downstream.finisher(),
        downstream.characteristics().toArray(new Collector.Characteristics[0]));
}
Solubris
  • 3,603
  • 2
  • 22
  • 37
Holger
  • 285,553
  • 42
  • 434
  • 765
  • Thanks @Holger, can you elaborate more on "accept the item to potentially appear in multiple groups". I didn't get this part. – Sandesh Kumar Aug 31 '16 at 08:08
  • 2
    If you group to a `Map>`, the `T` keys are unique, but when you multi-group into a `Map>>`, the `V` keys are only unique within their sub-map. But depending on your input data structure and actual goal, this might (likely) not be an issue. – Holger Aug 31 '16 at 10:30
  • In the Map> case, is each combination of T and V unique? As in, if there is a V1 that appears both at a T1 and a T2, will T2's entry of V1 include the T1 item? I'm not sure if I'm being clear... – El Suscriptor Justiciero Jan 02 '18 at 10:11
  • 1
    @ElSuscriptorJusticiero: not sure if I get your question correctly, your `V1` is not unique, but `result.get(T1).get(V1)` and `result.get(T2).get(V1)` will return distinct lists containing different items. – Holger Jan 02 '18 at 15:53