6

My class has two fields:

  • MyKey - the key that I want to group by
  • Set<MyEnum> - the set that I want to be flattened and merged.

I have a list of such objects, and what I want is to obtain a Map<MyKey, Set<MyEnum> of which the value is joined from all myEnums of the objects with this key.

For example, if I have three objects:

  1. myKey: key1, myEnums: [E1]
  2. myKey: key1, myEnums: [E2]
  3. myKey: key2, myEnums: [E1, E3]

The expected result should be:

key1 => [E1, E2], key2 => [E1, E3]

I came up with this code:

Map<MyKey, Set<MyEnum>> map = myObjs.stream()
        .collect(Collectors.groupingBy(
                MyType::getMyKey,
                Collectors.reducing(
                        new HashSet<MyEnum>(),
                        MyType::getMyEnums,
                        (a, b) -> {
                            a.addAll(b);
                            return a;
                        })));

There're two problems with it:

  1. The HashSet inside the reducing seems to be shared between all keys. That being said the actual run result of the above example is key1 => [E1, E2, E3], key2 => [E1, E2, E3]. Why is it the case?

  2. Even if this code works, it looks ugly especially at the part of reducing that I have to handle the logic of constructing the joined collection manually. Is there a better way of doing this?

Thank you!

Stefan Zobel
  • 3,182
  • 7
  • 28
  • 38
MGhostSoft
  • 528
  • 4
  • 13

2 Answers2

7

Notice that you are only ever creating one identity object: new HashSet<MyEnum>().

The BinaryOperator you supply as the third argument must be idempotent, the same way common math operators are, e.g. x = y + z doesn't change the value of y and z.

This means you need to merge the two input sets a and b, without updating either.

Also, working with enums, you should use EnumSet, not HashSet.

Map<MyKey, Set<MyEnum>> map = myObjs.stream()
        .collect(Collectors.groupingBy(
                    MyType::getMyKey,
                    Collectors.reducing(
                        EnumSet.noneOf(MyEnum.class), // <-- EnumSet
                        MyType::getMyEnums,
                        (a, b) -> {
                            EnumSet<MyEnum> c = EnumSet.copyOf(a); // <-- copy
                            c.addAll(b);
                            return c;
                        })));

UPDATE

Shorter, more streamlined version, that doesn't have to keep creating new sets while accumulating the result:

Map<MyKey, Set<MyEnum>> map = myObjs.stream()
        .collect(Collectors.groupingBy(
                    MyType::getMyKey,
                    Collector.of(
                            () -> EnumSet.noneOf(MyEnum.class),
                            (r, myObj) -> r.addAll(myObj.getMyEnums()),
                            (r1, r2) -> { r1.addAll(r2); return r1; }
                    )));
Andreas
  • 154,647
  • 11
  • 152
  • 247
  • Thank you! It works. Is there a way to simplify the reducing? If I were to operate on a stream of these values, I could flatMap them to streams and then collect them together so that the collections are flattened.. Is there a way to do such things in collectors? – MGhostSoft Aug 29 '16 at 22:06
  • It’s correct that the reduction operator must not modify it’s arguments, but this has nothing to do with Idempotence. For math operators, the concept of modifying a value simply does not exist (e.g. how do you modify a number?)… – Holger Aug 30 '16 at 16:07
  • 1
    @Holger You're right. I should have said that the `BinaryOperator` method must be a [pure function](https://en.wikipedia.org/wiki/Pure_function). – Andreas Aug 30 '16 at 16:11
  • 1
    @MGhostSoft: see also [this answer](http://stackoverflow.com/a/39131049/2711488). Your task is a bit simpler, i.e. you’ll have to replace the innermost `groupingBy(…)` collector of that answer by either `toSet()` or `toCollection(() -> EnumSet.noneOf(MyEnum.class))`, but besides that, the pattern is the same. – Holger Aug 30 '16 at 16:17
0

Not ideal, but using a mutable container makes it fairly easy to understand.

myObjs.stream()
  .collect(groupingBy(MyType::getMyKey)
  .entrySet().stream()
  .collect(toMap(
    Map.Entry::getKey, 
    e -> e.getValue()
      .stream()
      .flatMap(v -> v.getMyEnums().stream())
      .collect(toSet())
  )

Collectors.mapping(Function, Collector) is so nearly a perfect fit for what you want to do here, if only it were Collectors.flatMapping

EDIT: until java 9 is out, there is a handy implementation of flatMapping in this answer. With it our solution looks like this:

myObjs.stream()
  .collect(
    groupingBy(MyType::getMyKey,
    flatMapping(v -> v.getMyEnums().stream(), toSet())
  );
Community
  • 1
  • 1
kag0
  • 5,624
  • 7
  • 34
  • 67
  • Almost works except for that the last three lines should be: .flatMap(o -> o.getMyEnums().stream()) .collect(toSet()) ) – MGhostSoft Aug 29 '16 at 22:48
  • 2
    See the end of [this answer](http://stackoverflow.com/a/39131049/2711488) for an implementation of a `flatMapping` collector. It’s semantics is the same as [Java 9’s `flatMapping` collector](http://download.java.net/java/jdk9/docs/api/java/util/stream/Collectors.html#flatMapping-java.util.function.Function-java.util.stream.Collector-) or the `Stream`’s `flatMap` operation. – Holger Aug 30 '16 at 16:20
  • @Holger This is exactly what I'm looking for! – MGhostSoft Aug 30 '16 at 19:31
  • @Holger Actually I wonder how you knew that? I've read several articles on Java 9 new features but none of them mentioned that. Even now I'm having a hard time collecting the small new features added into Java 8 like Objects.isNull(). – MGhostSoft Aug 30 '16 at 19:38
  • @Holger good implementation, will do nicely to tide these use cases over until java 9. – kag0 Aug 30 '16 at 19:56
  • 1
    @MGhostSoft: well, there’s the generated API doc, which you can browse and look for `Since 8` or `Since 9` tags. As these classes and members are well tagged, it’s a shame that javadoc doesn’t take the opportunity to generate a “What’s New” page, linking to them, automatically. – Holger Aug 31 '16 at 10:40