2

I have a bean and a stream

public class TokenBag {
    private String token;
    private int count;
    // Standard constructor and getters here
}
Stream<String> src = Stream.of("a", "a", "a", "b", "b", "a", "a");

and want to apply some intermediate operation to the stream that returns another stream of objects of TokenBag. In this example there must be two: ("a", 3), ("b", 3) and ("a", 2).

Please think it as a very simplistic example. In real there will be much more complicated logic than just counting the same values in a row. Actually I try to design a simple parser that accepts a stream of tokens and returns a stream of objects.

Also please note that it must stay a stream (with no intermediate accumulation), and also in this example it must really count the same values in a row (it differs from grouping).

Will appreciate your suggestions about the general approach to this task solution.

Nick Legend
  • 789
  • 1
  • 7
  • 21
  • https://stackoverflow.com/questions/25441088/group-by-counting-in-java-8-stream-api – Alexis C. Jan 25 '18 at 09:28
  • Please think it as a very simplistic example. In real there will be much more complicated logic than just counting the same values in a row. Actually I try to design a simple parser that accepts a stream of tokens and returns a stream of objects. – Nick Legend Jan 25 '18 at 09:30
  • Guys, thanks for your answers!... but just some more details. I need it to stay a stream (there must be no terminal operation at the end), and also in this example it must really count the same values (not grouping). So from "aaabbaa" in must return (a, 3), (b, 2) and (a, 2). – Nick Legend Jan 25 '18 at 09:43

5 Answers5

4
Map<String, Long> result = src.collect(Collectors.groupingBy(Function.identity(), Collectors.counting()));
System.out.println(result);

This will give the desired output

a=4, b=3

You can then go ahead and iterate over map and create objects of TokenBag.

Rishikesh Dhokare
  • 3,559
  • 23
  • 34
2
    Stream<String> src = Stream.of("a", "a", "a", "a", "b", "b", "b");

// collect to map 
    Map<String, Long> counted = src
            .collect(Collectors.groupingBy(Function.identity(), Collectors.counting()));

// collect to list 
    List<TokenBag> tokenBags = counted.entrySet().stream().map(m -> new TokenBag(m.getKey(), m.getValue().intValue()))
            .collect(Collectors.toList());
Senior Pomidor
  • 1,823
  • 1
  • 14
  • 23
2

First group it to a Map and then map the entries to a TokenBag:

Map<String, Long> values = src.collect(Collectors.groupingBy(Function.identity(), Collectors.counting()));
List<TokenBag> tokenBags = values.entrySet().stream().map(entry -> {
    TokenBag tb = new TokenBag();
    tb.setToken(entry.getKey());
    tb.setCount(entry.getValue().intValue());
    return tb;
}).collect(Collectors.toList());
Georg Leber
  • 3,470
  • 5
  • 40
  • 63
1

You'd need to convert your stream to a Spliterator and then adapt this spliterator to a custom one that partially-reduces some elements according to your logic (in your example it would need to count equal elements until a different element appears). Then, you'd need to turn your spliterator back to a new stream.

Bear in mind that this can't be 100% lazy, as you'd need to eagerly consume some elements from the backing stream in order to create a new TokenBag element for the new stream.

Here's the code for the custom spliterator:

public class CountingSpliterator
        extends Spliterators.AbstractSpliterator<TokenBag>
        implements Consumer<String> {

    private final Spliterator<String> source;
    private String currentToken;
    private String previousToken;
    private int tokenCount = 0;
    private boolean tokenHasChanged;

    public CountingSpliterator(Spliterator<String> source) {
        super(source.estimateSize(), source.characteristics());
        this.source = source;
    }

    @Override
    public boolean tryAdvance(Consumer<? super TokenBag> action) {
        while (source.tryAdvance(this)) {
            if (tokenHasChanged) {
                action.accept(new TokenBag(previousToken, tokenCount));
                tokenCount = 1;
                return true;
            }
        }
        if (tokenCount > 0) {
            action.accept(new TokenBag(currentToken, tokenCount));
            tokenCount = 0;
            return true;
        }
        return false;
    }

    @Override
    public void accept(String newToken) {
        if (currentToken != null) {
            previousToken = currentToken;
        }
        currentToken = newToken;
        if (previousToken != null && !previousToken.equals(currentToken)) {
            tokenHasChanged = true;
        } else {
            tokenCount++;
            tokenHasChanged = false;
        }
    }
}

So this spliterator extends Spliterators.AbstractSpliterator and also implements Consumer. The code is quite complex, but the idea is that it adapts one or more tokens from the source spliterator into an instance of TokenBag.

For every accepted token from the source spliterator, the count for that token is incremented, until the token changes. At this point, a TokenBag instance is created with the token and the count and is immediately pushed to the Consumer<? super TokenBag> action parameter. Also, the counter is reset to 1. The logic in the accept method handles token changes, border cases, etc.

Here's how you should use this spliterator:

Stream<String> src = Stream.of("a", "a", "a", "b", "b", "a", "a");

Stream<TokenBag> stream = StreamSupport.stream(
        new CountingSpliterator(src.spliterator()),
        false); // false means sequential, we don't want parallel!

stream.forEach(System.out::println);

If you override toString() in TokenBag, the output is:

TokenBag{token='a', count=3}
TokenBag{token='b', count=2}
TokenBag{token='a', count=2}

A note on parallelism: I don't know how to parallelize this partial-reduce task, I even don't know if it's at all possible. But if it were, I doubt it would produce any measurable improvement.

fps
  • 33,623
  • 8
  • 55
  • 110
0

Create a map and then collect the map into the list:

Stream<String> src = Stream.of("a", "a", "a", "a", "b", "b", "b");
Map<String, Long> m = src.collect(Collectors.groupingBy(Function.identity(), Collectors.counting()));
m.entrySet().stream().map(e -> new TokenBag(e.getKey(), e.getValue().intValue())).collect(Collectors.toList());
St.Antario
  • 26,175
  • 41
  • 130
  • 318
  • 2
    same comment as above... under java-8 `Collectors.counting()` is worse than `Collectors.summingInt(x -> 1)` – Eugene Jan 25 '18 at 09:56
  • @Eugene Is it really that bad to use `Collectors.counting()`? I ask because I've seen you and others claiming it shouldn't be used due to performance reasons, while we're talking about streams here (not precisely the peak of performance)... – fps Jan 29 '18 at 13:17
  • @FedericoPeraltaSchaffner it's not, just boxing, lots of it potentially. If you can do it better, why not? – Eugene Jan 29 '18 at 13:18
  • @Eugene Well, I understand that boxing can happen a lot of times, but I'd really keep `Collectors.counting()` and wait for the fix instead of using `Collectors.summingInt(x -> 1)`. It looks so ugly... – fps Jan 29 '18 at 13:20
  • @Eugene Did not know that. Did not measure... How worse? Are there some benchmarks for that? – St.Antario Jan 29 '18 at 13:20
  • @St.Antario I did not measure, but that should be fairly trivial – Eugene Jan 29 '18 at 13:21
  • @Eugene Actually `summingInt` ended up 3 times faster. The unclear thing is that `summingInt` produces much more garbage with allocation rate `1953.630` vs `739.281` `MB/sec` on my machine... Still not clear why it's faster. – St.Antario Jan 29 '18 at 13:34
  • @St.Antario sounds so good for an investigation, but zero time on my side to look at :( really sorry – Eugene Jan 29 '18 at 13:35