2

I thought that all stream pipelines written using flatMap() can be converted to use mapMulti. Looks like I was wrong when the flatMap() or mapMulti() returns/operates on an infinite stream.

Note: this is for educational purpose only

When we map an element to an infinite stream inside a flatMap() followed by a limit(), then the stream pipeline is lazy and evaluates as per the required number of elements.

list.stream()
    .flatMap(element -> Stream.generate(() -> 1))
    .limit(3)
    .forEach(System.out::println);

Output:

1
1
1

But when doing the same in a mapMulti(), the pipeline is still lazy i.e., it doesn't consume the infinite stream. But when running this in IDE (Intellij), it hangs and doesn't terminate (I guess waiting for other elements consumption) and doesn't come out of the stream pipeline execution.

With a mapMulti(),

list.stream()
    .mapMulti((element, consumer) -> {
        Stream.generate(() -> 1)
            .forEach(consumer);
        })
    .limit(3)
    .forEach(System.out::println);
System.out.println("Done"); //Never gets here

Output:

1
1
1

But the last print (Done) doesn't get executed.

Is this the expected behaviour? I couldn't find any warning or points on infinite stream and mapMulti() in Javadoc.

Alexander Ivanchenko
  • 25,667
  • 5
  • 22
  • 46
Thiyagu
  • 17,362
  • 5
  • 42
  • 79
  • This has nothing to do with `mapMulti`. Why do you expect that `Stream.generate(() -> 1).forEach(consumer)` will ever terminate? – Lino Nov 17 '22 at 13:31
  • @Lino Sure. It doesn't. My point of this exercise is comparing behaviour of `mapMulti` and `flatMap`. It wouldn't terminate when used in a `flatMap` as well. But using flatMap, the stream pipeline is able to terminate based on the other *short-circuiting* operators (like limit or findFirst) – Thiyagu Nov 17 '22 at 13:36

1 Answers1

3

The advantage of mapMulti() is that it consumes new elements which became a part of the stream, replacing the initial element (opposed to flatMap() which internally generates a new stream for each element). If you're generating a fully-fledged stream with a terminal operation inside the mapMulti() it should be executed. And you've created an infinite stream which can't terminate (as @Lino has pointed out in the comment).

On the contrary, flatMap() expects a function producing a stream, i.e. function only returns it not processes.

Here's a quote from the API note that emphasizes the difference between the two operations:

API Note:

This method is similar to flatMap in that it applies a one-to-many transformation to the elements of the stream and flattens the result elements into a new stream. This method is preferable to flatMap in the following circumstances:

  • When replacing each stream element with a small (possibly zero) number of elements. Using this method avoids the overhead of creating a new Stream instance for every group of result elements, as required by flatMap.
Alexander Ivanchenko
  • 25,667
  • 5
  • 22
  • 46
  • This makes sense. So, semantically they are different. – Thiyagu Nov 17 '22 at 13:40
  • flatMap had this [bug](https://stackoverflow.com/a/29230939) initially when it wasn't lazy. I got curious if mapMulti also has the same bug. Understood the differences between them. – Thiyagu Nov 17 '22 at 13:50
  • 1
    @user7 These operations have similar purpose but act differently (otherwise there would be no point in introducing `mapMulti`). They both have pros and cons. `mapMulty()` can give a performance gain in certain circumstances (see the quote and link in the answer), but if you need to generate a new stream internally, it defeats the purpose of using `mapMulti`. It can't save you from overheads of creating a new stream when you're generating it for some purpose, in such cases go with a `flatMap()`. And consider `flatMap()` as a general purpose operation, `mapMulti()` as a special purpose operation – Alexander Ivanchenko Nov 17 '22 at 13:51
  • @user7 Combination `mapMulti(). some-opeations .findFirst()` would trigger all stateless operation in between (like `map`, `filter`), to be executed. So it's not lazy, but intuitively it's expected to be. For now, not sure should we call it a bug. – Alexander Ivanchenko Nov 17 '22 at 14:13
  • 3
    You don’t have to consume the stream if you have a stream inside the `mapMulti`, e.g. `Stream.of("").mapMulti((o, c) -> Stream.of("ignored")).findAny();` works fine. What matters, and that’s entirely independent of streams, is that the function passed to `mapMulti` *must complete in finite time*. E.g. `Stream.of("").mapMulti((o, c) -> { for(;;); }).findAny();` will also hang forever, obviously, despite there is no stream to consume. You can apply this logic to Stream operations within the function, `Stream.generate(() -> 1) .forEach(consumer);` never returns, just like `for(;;);` never returns – Holger Nov 17 '22 at 18:57
  • @Holger Yes, I agree. I've meant that if there's a fully-fledged stream with a terminal operation inside `mapMulti` it should be executed (*that's what I've written in the parenthesis*), i.e. the stream pipeline is considered consumed by the terminal operation whatever it is. I didn't imply that *mapMulti's* consumer should be plugged-in into this nested stream, that was not my point. I've meant that it should be able to terminate (precisely as you). I'll remove highlighted word **consumed** and change the phrasing. – Alexander Ivanchenko Nov 17 '22 at 19:57
  • @Holger Can you please shed some light on the difference in behavior of [these two streams](https://www.jdoodle.com/ia/zzY)? Can we can call it a bug? – Alexander Ivanchenko Nov 17 '22 at 20:20
  • 4
    That’s a performance issue, but not contradicting the specification. The documentation of `mapMulti` even says that it’s preferable “*with a small (possibly zero) number of elements*” (you cited it) when the overhead of creating a new Stream matters. When you have a large number of elements or heavy calculations following, using `flatMap` is preferable. – Holger Nov 18 '22 at 12:37