9

I recently learned about streams in Java 8 and started to work with them. Now I have a question regarding the groupingBy collector method:

Usually I work with .NET, so I compared (knowing they are not the same) Java Stream<T> with .NET IEnumerable<T>. Following this comparison, List<T> stores elements and the particular Stream/IEnumerable applies operations. One example:

C#:

elements.Where(x => x.Value == 5).ToList();

Java:

elements.stream().filter(x -> x.getValue() == 5).collect(Collectors.toList());

In both examples, I start with a list, define operations (a filter in this example) and collect the result to store it (in a new list in this example).

Now I got a more complex case:

data.stream()
    .map( ... ).filter( ... ) // Some operations
    .collect(groupingBy(Chunk::getName, summingLong(Chunk::getValue)));

The result of this query is a Map<String, Long> and I can work with this, but lets say, I want to proceed with this data instead of storing it. My current approach is trivial:

    ...
    .collect(groupingBy(Chunk::getName, summingLong(Chunk::getValue)))
    .entrySet().stream().
    .map( ... ) // Do more operations

But this way, I leave the stream, store the first result in a Map and open a new stream to continue. Is there a way to group without a collector, so that I can "stay" in the stream?

Lukas Körfer
  • 13,515
  • 7
  • 46
  • 62
  • 1
    Since you're grouping, you pretty much have to go to an intermediate store - what if you need to group a value at the start of the stream with a value at the end? You'd need to process "everything", storing it until you're sure there's nothing more to group with. So internally, Java would have to (/could?) put things in a map (or some similar structure) anyway; what's so bad about doing this yourself? – Andy Turner Jan 09 '17 at 13:22
  • It's not really a problem, I can live with my trivial approach, but in all my other stream operations there was a clear line between having a `List` and having a `Stream`. I opened it, operated on it and collected it. Now there is a Map, that gets opened for a new stream, so I was wondering if its possible with another approach. – Lukas Körfer Jan 09 '17 at 13:27
  • Maybe I got confused by my .NET background. There you can do any operation on an `IEnumerable` and in the end you collect your result via `ToList`, `ToArray`, `ToDictionary` or by iterating over it. – Lukas Körfer Jan 09 '17 at 13:31
  • 1
    This could be achieved by creating a custom `Spliterator`. Here is a post which describes how to turn a simple `groupBy` into a non terminal lazy `partitionBy`: [Partition a stream by a discriminator function](http://stackoverflow.com/a/28363324/7274990) – Calculator Jan 09 '17 at 14:19

2 Answers2

6

You can do whatever you like in the downstream collector, as long as you can describe the operation as a Collector. Currently, there is only an equivalent to the intermediate operation map, the mapping collector, but Java 9 will also add filtering and flatMapping (which you could also implement yourself in Java 8) and there’s already an equivalent to almost every terminal operation.

Of course, a nested appliance of collectors will look entirely different than a chain of Stream operations doing the same…

If, however, you want to process complete groups, there is no way around completing the grouping collection first. This is not a limitation of the API, but intrinsic to the grouping operation or any operation in general, if you want to process a complete result, you’ll need to complete the operation first. Regardless of how the API looks like, e.g. you could hide the follow-up operation in the collector in a collectingAndThen-like manner, creating and populating the Map is unavoidable, as it’s the map doing the maintenance of the groups. The groups are determined by the keys and lookup logic of the Map, so, e.g. using a SortedMap with a custom comparator or an IdentityHashMap, can change the grouping logic entirely.

Holger
  • 285,553
  • 42
  • 434
  • 765
5

As the API is right now, you can't escape it.

groupingBy

is a terminal operation (it does not return a Stream), so that operation will end the stream.

Depending on what you want later to do inside the last map operation, you could create a custom collector that will "stay" inside the stream; even if inside you would probably still gather elements into a Map.

Eugene
  • 117,005
  • 15
  • 201
  • 306
  • Is there any collector operation which is not terminal? If not, I assume this is by design, so it would be better to manually open a new stream, am I right? – Lukas Körfer Jan 09 '17 at 13:43