1

I'm looking for a way to limit the number of entries produced by Collectors.toMap() with a merge function. Consider the following example:

Map<String, Integer> m = Stream.of("a", "a", "b", "c", "d")
    .limit(3)
    .collect(toMap(Function.identity(), s -> 1, Integer::sum));

The problem with the above is that I'll only have 2 elements in the resulting map (a=2, b=1). Is there any convenient way to short-circuit the stream once it's processed 3 distinct keys?

shmosel
  • 49,289
  • 6
  • 73
  • 138
  • Can't you just filter? – SamTebbs33 Sep 02 '16 at 22:10
  • Could you do something like this? Not sure if it's allowed. `Map m = Stream.of("a", "a", "b", "c", "d") .limit(m.size() < 3 ? Integer.MAX_VALUE : 3) .collect(toMap(Function.identity(), 1, Integer::sum));` – nbokmans Sep 02 '16 at 22:12
  • 3
    *"Is there a way?"* Yes, write your own `toMap()` collector. Is that *convenient*? Matter of perspective, I guess. – Andreas Sep 02 '16 at 22:13
  • @Andreas Is it possible to write a short-circuiting collector? Do you mind posting an example? – shmosel Sep 02 '16 at 22:18
  • I guess one problem could be that, once the point of 3 distinct keys is reached, there might still be remaining elements in the stream with the same key, that would not get added into the map. Or is that intended? – Jorn Vernee Sep 02 '16 at 22:18
  • @JornVernee I'm not concerned about the remaining elements. – shmosel Sep 02 '16 at 22:19
  • So... `["a", "a", "b", "a", "b"]` in your case should produce `{a->2, b->1}` or `{a->3, b->2}`? – vsminkov Sep 02 '16 at 22:25
  • @vsminkov `{a=3, b=2}` because the limit of 3 keys was never reached. If the limit is reached, it could stop immediately, though I don't mind merging consecutive duplicates either. – shmosel Sep 02 '16 at 22:29
  • @shmosel is this what you're looking for: http://stackoverflow.com/questions/25441088/group-by-counting-in-java8-stream-api it's how to do a group and count using a stream. Sounds like the problem you are trying to tackle. Thought it doesn't limit the results – VLAZ Sep 02 '16 at 22:31
  • @Vld Not at all what I'm looking for. The counter is just demonstration of a merge function. Though I guess a similar question could be asked about `groupingBy()`. – shmosel Sep 02 '16 at 22:33
  • @shmosel then you should truncate resulting map because you cannot guarantee that you collect all repeating values till iterate whole stream – vsminkov Sep 02 '16 at 22:34
  • No, you can't short-circuit collectors. – Louis Wasserman Sep 02 '16 at 22:34
  • @vsminkov I said I don't care about subsequent duplicates. – shmosel Sep 02 '16 at 22:35

1 Answers1

2

A possible solution for this would be to write your own Spliterator, which would wrap the spliterator of a given Stream. This Spliterator would delegate the advancing calls to the wrapped spliterator and contain the logic of counting of many distinct elements have appeared.

For that, we can subclass AbstractSpliterator and provide our own tryAdvance logic. In the following, all elements encountered are added to a set. When the size of that set becomes greater than our maximum or when the wrapped spliterator has no remaining elements, we return false to indicate that there are no remaining elements to consider. This will stop when the numbers of distinct elements have been reached.

private static <T> Stream<T> distinctLimit(Stream<T> stream, int max) {
    Spliterator<T> spltr = stream.spliterator();
    Spliterator<T> res = new AbstractSpliterator<T>(spltr.estimateSize(), spltr.characteristics()) {

        private Set<T> distincts = new HashSet<>();
        private boolean stillGoing = true;

        @Override
        public boolean tryAdvance(Consumer<? super T> action) {
            boolean hasRemaining = spltr.tryAdvance(elem -> {
                distincts.add(elem);
                if (distincts.size() > max) {
                    stillGoing = false;
                } else {
                    action.accept(elem);
                }
            });
            return hasRemaining && stillGoing;
        }
    };
    return StreamSupport.stream(res, stream.isParallel()).onClose(stream::close);
}

With your example code, you would have:

Map<String, Long> m =
    distinctLimit(Stream.of("a", "a", "b", "c", "d"), 3)
        .collect(Collectors.groupingBy(Function.identity(), Collectors.counting()));

and the output would be the expected {a=2, b=1, c=1}, i.e. a map with 3 distinct keys.

Tunaki
  • 132,869
  • 46
  • 340
  • 423