I'm failing to understand the exact use case for Collectors.groupingByConcurrent
.
From the JavaDocs:
Returns a concurrent Collector implementing a cascaded "group by" operation on input elements of type T...
This is a concurrent and unordered Collector.
...
Maybe the keywords here are cascaded "group by". Does that point to something in how the actual accumulation is done by the collector? (looking at the source, it got intricate very quickly)
When I test it with a fake ConcurrentMap
class FakeConcurrentMap<K, V> extends HashMap<K, V>
implements ConcurrentMap<K, V> {}
I see that it breaks (gives wrong aggregations as the map isn't thread-safe) with parallel streams:
Map<Integer, Long> counts4 = IntStream.range(0, 1000000)
.boxed()
.parallel()
.collect(
Collectors.groupingByConcurrent(i -> i % 10,
FakeConcurrentMap::new,
Collectors.counting()));
Without .parallel()
, results are consistently correct. So it seems that groupingByConcurrent
goes with parallel streams.
But, as far as I can see, the following parallel stream collected with groupingBy
always produces correct results:
Map<Integer, Long> counts3 = IntStream.range(0, 1000000)
.boxed()
.parallel()
.collect(
Collectors.groupingBy(i -> i % 10,
HashMap::new,
Collectors.counting()));
So when is it correct to use groupingByConcurrent
instead of groupingBy
(surely that can't be just to get groupings as a concurrent map)?