3

A Collector has three generic types:

public interface Collector<T, A, R>

With A being the mutable accumulation type of the reduction operation (often hidden as an implementation detail).

If I want to create my custom collector, I need to create two classes:

  • one for the custom accumulation type
  • one for the custom collector itself

Is there any library function/trick that takes the accumulation type and provides a corresponding Collector?

Simple example

This example is extra simple to illustrate the question, I know I could use reduce for this case, but this is not what I am looking for. Here is a more complex example that sharing here would make the question too long, but it is the same idea.

Let's say I want to collect the sum of a stream and return it as a String.

I can implement my accumulator class:

public static class SumCollector {
   Integer value;

    public SumCollector(Integer value) {
        this.value = value;
    }

    public static SumCollector supply() {
        return new SumCollector(0);
    }

    public void accumulate(Integer next) {
       value += next;
    }

    public SumCollector combine(SumCollector other) {
       return new SumCollector(value + other.value);
    }

    public String finish(){
        return Integer.toString(value);
    }
}

And then I can create a Collector from this class:

Collector.of(SumCollector::supply, SumCollector::accumulate, SumCollector::combine, SumCollector::finish);

But it seems strange to me that they all refer to the the other class, I feel that there is a more direct way to do this.

What I could do to keep only one class would be implements Collector<Integer, SumCollector, String> but then every function would be duplicated (supplier() would return SumCollector::supply, etc).

Ricola
  • 2,621
  • 12
  • 22
  • I think you always need two classes. One will always be the accumulator object. And one will implement the `Collector` interface. But the accumulator object **doesn't** contain all those `supply()`, `combine()` and `finish()` methods. They would only be available in the class implementing the `Collector`. The holder class may also be a private inner `class` in the collector. Also for your example you could just use `AtomicInteger` as the accumulator. Leaving you with a single class `SumCollector implements Collector` that you'd have to implement – Lino Nov 11 '22 at 14:40
  • "The holder class may also be a private inner class in the collector." => I don't think I can do this as if I do `implements Collector`, I get `SumCollector.Acc' has private access in 'SumCollector'`. – Ricola Nov 11 '22 at 14:55
  • 1
    Oh yeah, then it sadly must be `public`. You could also invert the whole class structure. Make the `Collector` a private inner class of the accumulator. And then expose it only with a static method: `public static Collector collector() {return new SumCollector();}` – Lino Nov 11 '22 at 15:01

3 Answers3

3

There is no requirement for the functions to be implemented as methods of the container class.

This is how such a sum collector would be typically implemented

public static Collector<Integer, ?, Integer> sum() {
    return Collector.of(() -> new int[1],
        (a, i) -> a[0] += i,
        (a, b) -> { a[0] += b[0]; return a; },
        a -> a[0],
        Collector.Characteristics.UNORDERED);
}

But, of course, you could also implement it as

public static Collector<Integer, ?, Integer> sum() {
    return Collector.of(AtomicInteger::new,
        AtomicInteger::addAndGet,
        (a, b) -> { a.addAndGet(b.intValue()); return a; },
        AtomicInteger::intValue,
        Collector.Characteristics.UNORDERED, Collector.Characteristics.CONCURRENT);
}

You first have to find a suitable mutable container type for your collector. If no such type exists, you have to create your own class. The functions can be implemented as a method reference to an existing method or as a lambda expression.

For the more complex example, I don’t know of a suitable existing type for holding an int and a List, but you may get away with a boxed Integer, like this

final Map<String, Integer> map = …
List<String> keys = map.entrySet().stream().collect(keysToMaximum());
public static <K> Collector<Map.Entry<K,Integer>, ?, List<K>> keysToMaximum() {
    return Collector.of(
        () -> new AbstractMap.SimpleEntry<>(new ArrayList<K>(), Integer.MIN_VALUE),
        (current, next) -> {
            int max = current.getValue(), value = next.getValue();
            if(value >= max) {
                if(value > max) {
                    current.setValue(value);
                    current.getKey().clear();
                }
                current.getKey().add(next.getKey());
            }
        }, (a, b) -> {
            int maxA = a.getValue(), maxB = b.getValue();
            if(maxA <= maxB) return b;
            if(maxA == maxB) a.getKey().addAll(b.getKey());
            return a;
        },
        Map.Entry::getKey
    );
}

But you may also create a new dedicated container class as an ad-hoc type, not visible outside the particular collector

public static <K> Collector<Map.Entry<K,Integer>, ?, List<K>> keysToMaximum() {
    return Collector.of(() -> new Object() {
        int max = Integer.MIN_VALUE;
        final List<K> keys = new ArrayList<>();
    }, (current, next) -> {
        int value = next.getValue();
        if(value >= current.max) {
            if(value > current.max) {
                current.max = value;
                current.keys.clear();
            }
            current.keys.add(next.getKey());
        }
    }, (a, b) -> {
        if(a.max <= b.max) return b;
        if(a.max == b.max) a.keys.addAll(b.keys);
        return a;
    },
    a -> a.keys);
}

The takeaway is, you don’t need to create a new, named class to create a Collector.

Holger
  • 285,553
  • 42
  • 434
  • 765
  • Interesting usage of the ad hoc anonymous class. There's no way to reference the type of that class, but java infers the type correctly. For readability though it is recommended to create a named class. – Lino Nov 11 '22 at 15:11
  • 2
    @Lino I’d say, if the functions are short and close to each other, so you can overview the declaration and all uses at a glance, it’s acceptable. This specific example, with longer functions, is already borderline. It’s more for completeness. – Holger Nov 11 '22 at 15:18
  • My goal is not a one liner, nor to get rid of the accumulator class, my goal is to implement a `Collector` by implementing the accumulator class only, if possible. But from from the answers, this doesn't look possible. In my opinion, then `Collector.of` (or `.collect(...)`) is the cleanest/most readable option. – Ricola Nov 11 '22 at 15:29
  • 1
    @Ricola it’s not clear to me, what you mean with “accumulator class only”. Something like the `SummaryStatistics` of [this answer](https://stackoverflow.com/a/51378142/2711488) perhaps? It serves as container class and provides a factory method for the collector. In case of already existing or non-exposed container classes, you’d only provide the factory methods, just like in [`Collectors`](https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/util/stream/Collectors.html). – Holger Nov 11 '22 at 15:42
  • Something like `Collector.fromAccumulator(new SumCollector())`, but it obviously doesn't exist. – Ricola Nov 11 '22 at 16:20
  • 2
    There’s a reason why a collector has a supplier for the container. It must be capable of producing multiple instances when needed, e.g. when used in combination with `groupingBy` or for parallel evaluation. Creating a collector from a single instance would contradict the whole concept. Another contradiction is that users of a collector do *not* want to deal with the temporary container. In both of your examples, the final result type is different from the container type and I even provided two collector implementations with different container types, for each example. – Holger Nov 11 '22 at 17:39
  • "users of a collector do not want to deal with the temporary container" oh yes I agree, this is why I would have loved to have a way to encapsulate the container logic **inside** the custom `Collector` class. To illustrate, @Lino's suggestion under the question: "container class may also be a private inner class in the collector" would have been great if it was possible. And the alternative is to have the container class public, which I don't like, for this exact reason of wanting to have the container implementation not leaking outside of the `Collector` class. – Ricola Nov 11 '22 at 18:19
  • 2
    Why do you claim that encapsulating the container class was not possible? All of my four examples are capable of hiding the container class. The first two literally use `Collector` as the factory method’s return type, the same way all factory methods in [`Collectors`](https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/util/stream/Collectors.html) do. The caller doesn’t see the actual container class, hence, it can be anything, including private inner classes. In my fourth example, the container class **is** a private inner class, even an anonymous class. – Holger Nov 11 '22 at 18:27
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/249522/discussion-between-ricola-and-holger). – Ricola Nov 11 '22 at 18:40
2

It sounds like you want to supply only the reduction function itself, not all of the other things that come with a generic Collector. Perhaps you're looking for Collectors.reducing.

public static <T> Collector<T,?,T> reducing(T identity, BinaryOperator<T> op)

Then, to sum values, you would write

Collectors.reducing(0, (x, y) -> x + y);

or, in context,

Integer[] myList = new Integer[] { 1, 2, 3, 4 };
var collector = Collectors.reducing(0, (x, y) -> x + y);
System.out.println(Stream.of(myList).collect(collector)); // Prints 10
Silvio Mayolo
  • 62,821
  • 6
  • 74
  • 116
  • 1
    Just as a side note: Instead of a reducing collector, one could probably also use just the `reduce` method. – Lino Nov 11 '22 at 14:25
  • I provided a simple example on purpose, and I did write "I know I could use reduce for this case". Please look at https://stackoverflow.com/questions/74401764/how-to-collect-same-maximum-numbers-keys-as-list-in-java-8/74401856#74401856 for a more detailed example – Ricola Nov 11 '22 at 14:59
  • 2
    The full `Collector` API is verbose by *design*. If you're doing messy things with mutable state during reduction, you want your code to send up giant red signal flares screaming "I am mutable, read me carefully". If your reduction function is nice and referentially transparent, then it can absolutely be a one-liner. But if it's messy and complicated, then it *should* be a separate class. – Silvio Mayolo Nov 11 '22 at 15:11
2

I want to focus the wording of one point of your question, because I feel like it could be the crux of the underlying confusion.

If I want to create my custom collector, I need to create two classes:

one for the custom accumulation type one for the custom collector itself

No, you need to create only one class, that of your custom accumulator. You should use the appropriate factory method to instantiate your custom Collector, as you demonstrate yourself in the question.

Perhaps you meant to say that you need to create two instances. And that is also incorrect; you need to create a Collector instance, but to support the general case, many instances of the accumulator can be created (e.g., groupingBy()). Thus, you can't simply instantiate the accumulator yourself, you need to provide its Supplier to the Collector, and delegate to the Collector the ability to instantiate as many instances as required.

Now, think about the overloaded Collectors.of() method you feel is missing, the "more direct way to do this." Clearly, such a method would still require a Supplier, one that would create instances of your custom accumulator. But Stream.collect() needs to interact with your custom accumulator instances, to perform accumulate and combine operations. So the Supplier would have to instantiate something like this Accumulator interface:

public interface Accumulator<T, A extends Accumulator<T, A, R>, R> {

    /**
     * @param t a value to be folded into this mutable result container
     */
    void accumulate(T t);

    /**
     * @param that another partial result to be merged with this container
     * @return the combined results, which may be {@code this}, {@code that}, or a new container
     */
    A combine(A that);

    /**
     * @return the final result of transforming this intermediate accumulator
     */
    R finish();

}

With that, it's then straightforward to create Collector instances from an Supplier<Accumulator>:

    static <T, A extends Accumulator<T, A, R>, R> 
    Collector<T, ?, R> of(Supplier<A> supplier, Collector.Characteristics ... characteristics) {
        return Collector.of(supplier, 
                            Accumulator::accumulate, 
                            Accumulator::combine, 
                            Accumulator::finish, 
                            characteristics);
    }

Then, you'd be able to define your custom Accumulator:

final class Sum implements Accumulator<Integer, Sum, String> {

    private int value;

    @Override
    public void accumulate(Integer next) {
        value += next;
    }

    @Override
    public Sum combine(Sum that) {
        value += that.value;
        return this;
    }

    @Override
    public String finish(){
        return Integer.toString(value);
    }

}

And use it:

String sum = ints.stream().collect(Accumulator.of(Sum::new, Collector.Characteristics.UNORDERED));

Now… it works, and there's nothing too horrible about it, but is all the Accumulator<A extends Accumulator<A>> mumbo-jumbo "more direct" than this?

final class Sum {

    private int value;

    private void accumulate(Integer next) {
        value += next;
    }

    private Sum combine(Sum that) {
        value += that.value;
        return this;
    }

    @Override
    public String toString() {
        return Integer.toString(value);
    }

    static Collector<Integer, ?, String> collector() {
        return Collector.of(Sum::new, Sum::accumulate, Sum::combine, Sum::toString, Collector.Characteristics.UNORDERED);
    }

}

And really, why have an Accumulator dedicated to collecting to a String? Wouldn't reduction to a custom type be more interesting? Something that along the lines of IntSummaryStatistics that has other useful methods like average() alongside toString()? This approach is a lot more powerful, requires only one (mutable) class (the result type) and can encapsulate all of its mutators as private methods rather than implementing a public interface.

So, you're welcome to use something like Accumulator, but it doesn't really fill a real gap in the core Collector repertoire.

erickson
  • 265,237
  • 58
  • 395
  • 493
  • "why have an Accumulator dedicated to collecting to a String" => this was just for the sake of the example of course. It's good that you mention `IntSummaryStatistics` because I see they do `collect(IntSummaryStatistics::new, IntSummaryStatistics::accept, IntSummaryStatistics::combine)`, which hints that such method _could_ have been useful. But if we wanted to have such method/interface in the standard library, there would need one per Stream type, which, as you pointed out , is likely not worth it. – Ricola Nov 12 '22 at 11:30