5

I'm trying to understand the implementation of downstream reduction in JDK. Here is it:

   public static <T, K, D, A, M extends Map<K, D>>
    Collector<T, ?, M> groupingBy(Function<? super T, ? extends K> classifier,
                                  Supplier<M> mapFactory,
                                  Collector<? super T, A, D> downstream) {
        Supplier<A> downstreamSupplier = downstream.supplier();
        BiConsumer<A, ? super T> downstreamAccumulator = downstream.accumulator();
        BiConsumer<Map<K, A>, T> accumulator = (m, t) -> {
            K key = Objects.requireNonNull(classifier.apply(t), "element cannot be mapped to a null key");
            A container = m.computeIfAbsent(key, k -> downstreamSupplier.get());
            downstreamAccumulator.accept(container, t);
        };
        BinaryOperator<Map<K, A>> merger = Collectors.<K, A, Map<K, A>>mapMerger(downstream.combiner());
        @SuppressWarnings("unchecked")
        Supplier<Map<K, A>> mangledFactory = (Supplier<Map<K, A>>) mapFactory;

        if (downstream.characteristics().contains(Collector.Characteristics.IDENTITY_FINISH)) {
            return new CollectorImpl<>(mangledFactory, accumulator, merger, CH_ID);
        }
        else {
            @SuppressWarnings("unchecked")
            Function<A, A> downstreamFinisher = 
                       (Function<A, A>) downstream.finisher();  //1, <------------- HERE
            Function<Map<K, A>, M> finisher = intermediate -> {
                intermediate.replaceAll((k, v) -> downstreamFinisher.apply(v));
                @SuppressWarnings("unchecked")
                M castResult = (M) intermediate;
                return castResult;
            };
            return new CollectorImpl<>(mangledFactory, accumulator, merger, finisher, CH_NOID);
        }
    }

At //1, the downstreamFinisher is of type Function<A, D>. Judging by the type parameters declarations <T, K, D, A, M extends Map<K, D>>, the type D does not depend on A. So why do we cast it to Function<A, A>. I think, the type D may not even be a subclass of A.

What did I miss?

Holger
  • 285,553
  • 42
  • 434
  • 765
stella
  • 2,546
  • 2
  • 18
  • 33

1 Answers1

6

This collector is actually violating the generic type safety of the Map, if the downstream collector doesn’t have an identity finisher and the finisher function returns a different type than the intermediate container type.

During the collect operation, the map will hold objects of type A, i.e. the intermediate container type. Then, at the end of the operation, groupingBy’s finisher will go through the map and apply the finisher function to each value and replace it with the final result.

Of course, this can’t be implemented without unchecked operations. There are multiple ways to do it, the variant you have posted, changes the type of the map supplier from Supplier<M> to Supplier<Map<K, A>> (the first unchecked operation), so the compiler accepts that the map will hold values of type A rather than D. That’s why the finisher function must be changed to Function<A,A> (the second unchecked operation), so it can be used in the map’s replaceAll operation which appears to require objects of type A despite it’s actually D. Finally, the result map must be cast to M (the third unchecked operation) to get an object of the expected result type M, which the supplier actually supplied.

The correct type safe alternative would be to use different maps and perform the finishing operation by populating the result map with the result of converting the values of the intermediate map. Not only might this be an expensive operation, it would require a second supplier for the intermediate map, as the provided supplier only produces maps suitable for the final result. So apparently, the developers decided this to be an acceptable breach of the type safety.

Note that you can notice the unsafe operation, when you try to use a Map implementation which enforces type safety:

Stream.of("foo", "bar").collect(Collectors.groupingBy(String::length,
    () -> Collections.checkedMap(new HashMap<>(), Integer.class, Long.class),
    Collectors.counting()));

will produce a ClassCastException with this implementation because it tries to put an intermediate container (an array) instead of the Long into the Map.

Tunaki
  • 132,869
  • 46
  • 340
  • 423
Holger
  • 285,553
  • 42
  • 434
  • 765
  • +1000. I got here from your comment at https://stackoverflow.com/questions/40386197/classcastexception-when-using-custom-map-supplier-in-grouping-by and I just have to say that this implementation, apparently official, is damned **INSANE** (insert lots and lots of Zalgo here!). Yet additional proof, if any was needed, that the entire Streams API in Java is hot stinking garbage. (There are _lots_ of other reasons. Just compare it to LINQ and you'll discover them all.) That the `mapFactory` must be empty - documented, yes, but very quietly - is stupid and the `ClassCastException` you ... – davidbak Dec 27 '22 at 23:43
  • ... get at as result is _impossible_ to figure out since there is, of course, no `[J` == `long[]` _anywhere to be found in your code!_ Thank you for this answer. – davidbak Dec 27 '22 at 23:46
  • BTW, the 4-parameter `Collectors.toMap` which takes a `mapFactory` _also_ documents the restriction that the collection be empty but apparently doesn't use this bloody abortion of misbegotten unchecked code in its implementation: The equivalent code using that `toMap` _works just fine_. But apparently, that's only an accident and could change at any time that the brain-damaged author of this code decides to have another go at it because he did, stupidly but carefully, document that it's not supposed to work. (He didn't however insert an `assert(map.isEmpty())` to _enforce_ the restriction.) – davidbak Dec 27 '22 at 23:55
  • (see https://jdoodle.com/ia/BD7 for an example using the 4-parameter `toMap` that works and this `groupingBy`+`counting` that blows up) – davidbak Dec 27 '22 at 23:58
  • 2
    @davidbak the `toMap` collector performs a Reduction where intermediate values have the same type as the final result, i.e. all accumulation is done with boxed `Long` objects when summing up. Hence, you have no type issues but some boxing overhead with really large groups. In contrast, the `counting()` collector uses a container for a mutable `long` value (Java 9+) which must be replaced with the actual `Long` result at the end. That’s a fundamentally different approach (see also https://stackoverflow.com/q/57041896/2711488) and you can safely assume `toMap` to never change in this regard. – Holger Jan 02 '23 at 11:31
  • very helpful comment (and linked answer+comments). (Just to be clear, my main issue in _this particular_ Q/A is the _exploiting for profit_ type erasure which I have heretofore considered a _very bad idea_ - it's like the ultimate exposed implementation detail that you should never exploit (though you have to remember it, otherwise you can't understand lots of other issues you run in to...). The history of Java generics is a sad one. Lots of _great_ academic work went into developing it given the constraints they were given, but because of those constraints it's just non optimal.) – davidbak Jan 02 '23 at 19:08
  • 1
    @davidbak I understand your objections regarding the design, but there’s nothing I could add. So I just focused on explaining as much as I can about how it works and how to deal with it, pragmatically. – Holger Jan 03 '23 at 08:18