Map a Stream to a Map that is counting?

Question

I have a Stream<T>, is it possible to generate a Map<T, Long> that increases count by one for every element? I'd also like if it could reverse the T, Long to Long, T, before storing it in a Map, example input/output:

Example contents of Stream<String>:

kgj
def
dgh

Wanted output for first question:

(kgj, 1)
(def, 2)
(dgh, 3)

Wanted output in the end:

(1, kgj)
(2, def)
(3, dgh)

What I have so far, for a Stream<VertexData>:

    primitives.stream()
            .flatMap(primitive -> primitive.getVertexData().stream())
            .distinct()
            .collect(Collectors.groupingBy(i -> i, Collectors.counting()))
            .forEach((k, v) -> System.out.println("k = " + k + " / v = " + v));

This currently turns a Stream<Primitive> into a Stream<VertexData> with distinct elements, and then to a Map<VertexData, Long> that counts occurences, which is not what I want, as I want it to keep on counting at every element that passes.

Is there a way to do what I ask?

http://stackoverflow.com/questions/18552005/is-there-a-concise-way-to-iterate-over-a-stream-with-indices-in-java-8 — assylias, Mar 06 '14 at 10:02
@assylias I'd say it provides a workaround and thanks for guiding me to that question, however it's not a definitive answer, hence why you posted it as a comment I think. — skiwi, Mar 06 '14 at 10:05
What would you like your output for the first question to be when using inputs with repetions (such as ["abc", "def", "abc"])? — jpvee, Mar 06 '14 at 11:28
@jpvee It should only count distinct elements, (I had that already in *what I have so far*), with streams it is very easy to get a stable distinct with `list.stream().distinct()`. — skiwi, Mar 06 '14 at 11:31
In scala I would just `x.zipWithIndex` and for the second: `x.zipWithIndex.map(x => (x._2,x._1))` — Jatin, Mar 07 '14 at 07:25

score 1 · Accepted Answer · answered Mar 06 '14 at 12:40

What you can do is write your own Collector that does the counting for you on the elements encountered. Something like the following works:

  Stream<String> strings = Stream.of("kgj", "def", "dgh");

  strings.distinct().collect(Collector.of(
          HashMap::new,
          (BiConsumer<Map<String, Long>, String>) (map, str) -> {
            map.put(str, Long.valueOf(map.size() + 1));
          },
          (left, right) -> {
            long s = left.size();
            right.entrySet().forEach(e -> left.put(e.getKey(),
                                                   Long.valueOf(
                                                           e.getValue()
                                                                   .longValue()
                                                           + s)));
            return left;
          })).forEach((k, v) -> System.out.println("k = " + k + " / v = " + v));

Note that the (somewhat complex) combiner (third argument to Collector.of() is needed in cases where the stream is processed in parallel).

Any clue what this would do to the performance? And is there a way to 'disable' parallel processing, as you can never count in parallel anyway... Or can you? — skiwi, Mar 06 '14 at 12:44

score 1 · Answer 2 · answered Mar 06 '14 at 13:39

Inspiried by @jpvee I made a Collector that answers the last subquestion of the question and intends to be somewhat more clear aswell:

public static <T> Collector<T, ?, Map<Long, T>> indexing() {
    return Collector.of(
            HashMap::new,
            (map, t) -> map.put(Long.valueOf(map.size() + 1), t),
            (left, right) -> {
                final long size = left.size();
                right.forEach((k, v) -> left.put(k + size, v));
                return left;
            },
            Collector.Characteristics.CONCURRENT
    );
}

What this does is:

Operated on a Stream<T>, it returns a Map<Long, T> that indexes T with the encounter order of the stream.
It uses Collector.of, which in normal words accepts the following:
- A supplier for the Map.
- An accumulator that adds one element to the Map.
- A combiner that combines two Maps that possibly came from concurrent access.
- A list of charasteristics.

So what I have provided is:

Supplier<R>: A new HashMap.
BiConsumer<R, T>: A function that takes the map and adds a new element.
BinaryOperator<R>: A function that combines two maps.
Collector.Charasteristics: The charasterics for this method, which allows concurrent execution.

Kind of an interesting use of collectors, but it ends up being *O(n log n)* in time. The reason is that each merge copies exactly half the elements, and there are *log n* levels in the merge tree. This could be run in parallel, but sufficiently large *n* will swamp any parallel speedup. — Stuart Marks, Mar 07 '14 at 03:22

Map a Stream to a Map that is counting?

2 Answers2