1

Often I find the need to concatenate Stream instances together. If there are just two then the Stream.concat(Stream,Stream) utility method is perfect, but often I have 3 or more to combine and I'd like to identify the best construct to achieve that.

Clearly I could compose calls to Stream.concat(Stream,Stream), but this seems a little illegible, especially if the composed streams are each more complicated expressions:

Stream.concat(a, Stream.concat(b, Stream.concat(c, d)))

Recently I've been favouring using .flatMap(Function.identity()) as a more succinct variation that I think reads more clearly:

Stream.of(a, b, c, d).flatMap(Function.identity())

Although in writing up the question it also occurs to me that an even more succinct option would be to use Stream.concat(Stream,Stream) in a reduce(..) call:

Stream.of(a, b, c, d).reduce(Stream::concat)

I notice the java docs warn against repeated concatenation though, which made me wonder would the flatMap(..) approach have the same limitation? Are there other pros/cons I should be aware of before making an otherwise aesthetic choice?

Rob Oxspring
  • 2,835
  • 1
  • 22
  • 28
  • 1
    There’s a difference between `concat(concat(a, b), concat(c, d))` and `concat(a, concat(b, concat(c, d))`. While semantically equivalent, the latter is *unbalanced* and may easily get out of hands. Using `concat` with `reduce` will produce unbalanced streams. Compare with [this answer](https://stackoverflow.com/a/59885137/2711488). Each chained `flatMap` creates exactly one level of nesting, regardless of the number of elements. – Holger Feb 17 '21 at 14:11

1 Answers1

3

would the flatMap(..) approach have the same limitation?

The documentation for flatMap says that the stream it returns is the result of replacing each element of the old stream with the contents of the stream returned by the mapping function, when applied on that element. This wording suggests (to me) that there isn't any nesting of the streams, even though it isn't very clear about this. When the pipeline receives an element, it just needs to apply the function to it, and call forEach on the stream returned, doing whatever downstream thing it needs to do.

With concat however, there is clearly nesting. Each concat creates a Stream that has 2 parts. When you have a stream pipeline that is formed by repeated concating, downstream code will be executed for each element in the first part first, then for each element of in the second part. But since this is a heavily nested concat, one of the parts is also divided into two parts, and one of those parts also has two parts... To access the most deeply nested element, you need to go through a lot of calls.

Here's some code that illustrates this.

Suppose we have the streams a to g:

var a = Stream.of(1);
var b = Stream.of(2);
var c = Stream.of(3);
var d = Stream.of(4);
var e = Stream.of(5);
var f = Stream.of(6);
var g = Stream.of(7);

We can concat them using all three ways and print the stack trace length:

// 1
Stream.of(a, b, c, d, e, f, g).flatMap(Function.identity()).forEachOrdered(x -> {
  System.out.println(new Exception().getStackTrace().length);
});

// 2
Stream.concat(Stream.concat(Stream.concat(Stream.concat(Stream.concat(Stream.concat(a, b), c), d), e), f), g)
    .forEachOrdered(x -> {
      System.out.println(new Exception().getStackTrace().length);
    });

// 3
Stream.of(a, b, c, d, e, f, g).reduce(Stream::concat).get().forEachOrdered(x -> {
  System.out.println(new Exception().getStackTrace().length);
});

(Note that you should run only one of these at a time.)

On my machine, (1) prints 13 seven times no matter how many streams I pass into of. Both (2) and (3) prints:

10
10
9
8
7
6
5

The first elements 1 and 2 are the most deeply nested, which is why they have the longest stack trace.

Sweeper
  • 213,210
  • 22
  • 193
  • 313
  • 2
    Why not max it out? `IntStream.range(0, 1_000_000) .mapToObj(Stream::of) .flatMap(Function.identity()).forEach(c -> {});` versus `IntStream.range(0, 1_000_000) .mapToObj(Stream::of) .reduce(Stream::concat).get().forEach(c -> {});` It makes the difference between completing in a few milliseconds and throwing a `StackOverflowError` after some seconds… – Holger Feb 17 '21 at 14:18