0

I have a Stream<T>, is it possible to generate a Map<T, Long> that increases count by one for every element? I'd also like if it could reverse the T, Long to Long, T, before storing it in a Map, example input/output:

Example contents of Stream<String>:

kgj
def
dgh

Wanted output for first question:

(kgj, 1)
(def, 2)
(dgh, 3)

Wanted output in the end:

(1, kgj)
(2, def)
(3, dgh)

What I have so far, for a Stream<VertexData>:

    primitives.stream()
            .flatMap(primitive -> primitive.getVertexData().stream())
            .distinct()
            .collect(Collectors.groupingBy(i -> i, Collectors.counting()))
            .forEach((k, v) -> System.out.println("k = " + k + " / v = " + v));

This currently turns a Stream<Primitive> into a Stream<VertexData> with distinct elements, and then to a Map<VertexData, Long> that counts occurences, which is not what I want, as I want it to keep on counting at every element that passes.

Is there a way to do what I ask?

Stuart Marks
  • 127,867
  • 37
  • 205
  • 259
skiwi
  • 66,971
  • 31
  • 131
  • 216
  • http://stackoverflow.com/questions/18552005/is-there-a-concise-way-to-iterate-over-a-stream-with-indices-in-java-8 – assylias Mar 06 '14 at 10:02
  • @assylias I'd say it provides a workaround and thanks for guiding me to that question, however it's not a definitive answer, hence why you posted it as a comment I think. – skiwi Mar 06 '14 at 10:05
  • What would you like your output for the first question to be when using inputs with repetions (such as ["abc", "def", "abc"])? – jpvee Mar 06 '14 at 11:28
  • @jpvee It should only count distinct elements, (I had that already in *what I have so far*), with streams it is very easy to get a stable distinct with `list.stream().distinct()`. – skiwi Mar 06 '14 at 11:31
  • In scala I would just `x.zipWithIndex` and for the second: `x.zipWithIndex.map(x => (x._2,x._1))` – Jatin Mar 07 '14 at 07:25

2 Answers2

1

What you can do is write your own Collector that does the counting for you on the elements encountered. Something like the following works:

  Stream<String> strings = Stream.of("kgj", "def", "dgh");

  strings.distinct().collect(Collector.of(
          HashMap::new,
          (BiConsumer<Map<String, Long>, String>) (map, str) -> {
            map.put(str, Long.valueOf(map.size() + 1));
          },
          (left, right) -> {
            long s = left.size();
            right.entrySet().forEach(e -> left.put(e.getKey(),
                                                   Long.valueOf(
                                                           e.getValue()
                                                                   .longValue()
                                                           + s)));
            return left;
          })).forEach((k, v) -> System.out.println("k = " + k + " / v = " + v));

Note that the (somewhat complex) combiner (third argument to Collector.of() is needed in cases where the stream is processed in parallel).

jpvee
  • 913
  • 1
  • 15
  • 23
  • Any clue what this would do to the performance? And is there a way to 'disable' parallel processing, as you can never count in parallel anyway... Or can you? – skiwi Mar 06 '14 at 12:44
1

Inspiried by @jpvee I made a Collector that answers the last subquestion of the question and intends to be somewhat more clear aswell:

public static <T> Collector<T, ?, Map<Long, T>> indexing() {
    return Collector.of(
            HashMap::new,
            (map, t) -> map.put(Long.valueOf(map.size() + 1), t),
            (left, right) -> {
                final long size = left.size();
                right.forEach((k, v) -> left.put(k + size, v));
                return left;
            },
            Collector.Characteristics.CONCURRENT
    );
}

What this does is:

  • Operated on a Stream<T>, it returns a Map<Long, T> that indexes T with the encounter order of the stream.
  • It uses Collector.of, which in normal words accepts the following:
    • A supplier for the Map.
    • An accumulator that adds one element to the Map.
    • A combiner that combines two Maps that possibly came from concurrent access.
    • A list of charasteristics.

So what I have provided is:

  • Supplier<R>: A new HashMap.
  • BiConsumer<R, T>: A function that takes the map and adds a new element.
  • BinaryOperator<R>: A function that combines two maps.
  • Collector.Charasteristics: The charasterics for this method, which allows concurrent execution.
skiwi
  • 66,971
  • 31
  • 131
  • 216
  • Kind of an interesting use of collectors, but it ends up being *O(n log n)* in time. The reason is that each merge copies exactly half the elements, and there are *log n* levels in the merge tree. This could be run in parallel, but sufficiently large *n* will swamp any parallel speedup. – Stuart Marks Mar 07 '14 at 03:22