30

I've come across a rule in Sonar which says:

A key difference with other intermediate Stream operations is that the Stream implementation is free to skip calls to peek() for optimization purpose. This can lead to peek() being unexpectedly called only for some or none of the elements in the Stream.

Also, it's mentioned in the Javadoc which says:

This method exists mainly to support debugging, where you want to see the elements as they flow past a certain point in a pipeline

In which case can java.util.Stream.peek() be skipped? Is it related to debugging?

Nikolas Charalambidis
  • 40,893
  • 16
  • 117
  • 183
Evgeny Mamaev
  • 1,237
  • 1
  • 14
  • 31
  • 4
    For example if you read [a more up to date version of the documentation](https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/util/stream/Stream.html#peek(java.util.function.Consumer)) it says *"In cases where the stream implementation is able to optimize away the production of some or all the elements (**such as with short-circuiting operations like findFirst, or in the example described in count()**), the action will not be invoked for those elements."* – Federico klez Culloca Aug 24 '22 at 08:56
  • 5
    I believe that the sonar description `Stream.peek() A key difference with other intermediate Stream operations is that the Stream implementation is free to skip calls to peek() for optimization purpose.` **is wrong**. `.peek` is not treated specially from `.map` operator where the following example showcases this: `Stream.of("A", "B", "C", "D").map(a ->{System.out.println(a);return a; }).count();` Here the `map` operator is skipped as well. This is just an optimization of Streams not related with specific operator peek. – Panagiotis Bougioukos Aug 24 '22 at 21:41
  • 1
    You can also scroll down and check my answer where I explain that the example that Sonar uses, is based on lazy computation which is just a part of Stream Api and not specific to `peak`. But this is important to understand to not fall in the trap that Sonar reports in this example. – Panagiotis Bougioukos Aug 25 '22 at 09:13
  • 2
    See also [In Java streams is peek really only for debugging?](https://stackoverflow.com/q/33635717/2711488) – Holger Aug 25 '22 at 13:40

3 Answers3

27

Not only peek but also map can be skipped. It is for sake of optimization. For example, when the terminal operation count() is called, it makes no sense to peek or map the individual items as such operations do not change the number/count of the present items.

Here are two examples:


1. Map and peek are not skipped because the filter can change the number of items beforehand.

long count = Stream.of("a", "aa")
    .peek(s -> System.out.println("#1"))
    .filter(s -> s.length() < 2)
    .peek(s -> System.out.println("#2"))
    .map(s -> {
        System.out.println("#3");
        return s.length();
    })
    .count();
#1
#2
#3
#1
1

2. Map and peek are skipped because the number of items is unchanged.

long count = Stream.of("a", "aa")
    .peek(s -> System.out.println("#1"))
  //.filter(s -> s.length() < 2)
    .peek(s -> System.out.println("#2"))
    .map(s -> {
        System.out.println("#3");
        return s.length();
    })
    .count();
2

Important: The methods should have no side-effects (they do above, but only for the sake of example).

Side-effects in behavioral parameters to stream operations are, in general, discouraged, as they can often lead to unwitting violations of the statelessness requirement, as well as other thread-safety hazards.

The following implementation is dangerous. Assuming callRestApi method performs a REST call, it won't be performed as the Stream violates the side-effect.

long count = Stream.of("url1", "url2")
    .map(string -> callRestApi(HttpMethod.POST, string))
    .count();
/**
 * Performs a REST call
 */
public String callRestApi(HttpMethod httpMethod, String url);
Nikolas Charalambidis
  • 40,893
  • 16
  • 117
  • 183
  • 4
    Well, in theory, in example 1, the last `map` operation could be skipped, because after the last `filter` call, the element count cannot change. – MC Emperor Aug 24 '22 at 13:42
  • 2
    @MCEmperor there are a lot of theoretical optimization opportunities which are currently unused but may be used in a future version. That’s what makes relying on the absence of a legal optimization is so dangerous. – Holger Aug 25 '22 at 11:56
  • 2
    If you want to have a bit more fun, use `IntStream.iterate(1, i -> i + 1) .flatMap(i -> IntStream.range(i, i + 10)) .peek(System.out::println) .filter(i -> i == 2) .findFirst() .ifPresent(System.out::println);` and compare the Java 8 output and, e.g. Java 11 output. Then, you might insert a `.parallel()` somewhere and see what happens then… – Holger Aug 25 '22 at 12:07
13

peek() is an intermediate operation, and it expects a consumer which perform an action (side-effect) on elements of the stream.

In case when a stream pipe-line doesn't contain intermediate operations which can change the number of elements in the stream, like takeWhile, filter, limit, etc., and ends with terminal operation count() and when the stream-source allows evaluating the number of elements in it, then count() simply interrogates the source and returns the result. All intermediate operations get optimized away.

Note: this optimization of count() operation, which exists since Java 9 (see the API Note), is not directly related to peek(), it would affect every intermediate operation which doesn't change the number of elements in the stream (for now these are map(), sorted(), peek()).

There's More to it

peek() has a very special niche among other intermediate operations.

By its nature, peek() differs from other intermediate operations like map() as well as from the terminal operations that cause side-effects (like peek() does), performing a final action for each element that reaches them, which are forEach() and forEachOrdered().

The key point is that peek() doesn't contribute to the result of stream execution. It never affects the result produced by the terminal operation, whether it's a value or a final action.

In other words, if we throw away peek() from the pipeline, it would not affect the terminal operation.

Documentation of the method peek() as well the Stream API documentation warns its action could be elided, and you shouldn't rely on it.

A quote from the documentation of peek():

In cases where the stream implementation is able to optimize away the production of some or all the elements (such as with short-circuiting operations like findFirst, or in the example described in count()), the action will not be invoked for those elements.

A quote from the API documentation, paragraph Side-effects:

The eliding of side-effects may also be surprising. With the exception of terminal operations forEach and forEachOrdered, side-effects of behavioral parameters may not always be executed when the stream implementation can optimize away the execution of behavioral parameters without affecting the result of the computation.

Here's an example of the stream (link to the source) where none of the intermediate operations gets elided apart from peek():

Stream.of(1, 2, 3)
    .parallel()
    .peek(System.out::println)
    .skip(1)
    .map(n -> n * 10)
    .forEach(System.out::println);

In this pipe-line peek() presides skip() therefor you might expect it to display every element from the source on the console. However, it doesn't happen (element 1 will not be printed). Due to the nature of peek() it might be optimized away without breaking the code, i.e. without affecting the terminal operation.

That's why documentation explicitly states that this operation is provided exclusively for debugging purposes, and it should not be assigned with an action which needs to be executed at any circumstances.

Alexander Ivanchenko
  • 25,667
  • 5
  • 22
  • 46
4

The referenced optimization at this thread is the known architecture of java streams which is based on lazy computation.

Streams are lazy; computation on the source data is only performed when the terminal operation is initiated, and source elements are consumed only as needed. (java doc)

Also

Intermediate operations return a new stream. They are always lazy; executing an intermediate operation such as filter() does not actually perform any filtering, but instead creates a new stream that, when traversed, contains the elements of the initial stream that match the given predicate. Traversal of the pipeline source does not begin until the terminal operation of the pipeline is executed. (java doc)

This lazy computation affects several other operators not just .peek. In the same way that peek (which is an intermediate operation) is affected by this lazy computation are also all other intermediate operations affected (filter, map, mapToInt, mapToDouble, mapToLong, flatMap, flatMapToInt, flatMapToDouble, flatMapToLong). But probably someone not understanding the concept of lazy computation can be caught in the trap with .peek that sonar reports here.

So the example that the Sonar correctly reports

Stream.of("one", "two", "three", "four")
                .filter(e -> e.length() > 3)
                .peek(e -> System.out.println("Filtered value: " + e));

should not be used as is, because no terminal operation in the above example exists. So Streams will not invoke at all the intermidiate .peek operator, even though 2 elements ( "three", "four") are eligible to pass through the stream pipeline.

Example 1. Add a terminal operator like the following:

Stream.of("one", "two", "three", "four")
                .filter(e -> e.length() > 3)
                .peek(e -> System.out.println("Filtered value: " + e))
                .collect(Collectors.toList());  // <----

and the elements passed through would be also passed through .peek intermediate operator. Never an element would be skipped on this example.

Example 2. Now here is the interesting part, if you use some other terminal operator for example the .findFirst because the Stream Api is based on lazy computation

Stream.of("one", "two", "three", "four")
                .filter(e -> e.length() > 3)
                .peek(e -> System.out.println("Filtered value: " + e))
                .findFirst();  // <----

Only 1 element will pass through the operator .peek and not 2.

But as long as you know what you are doing (example 1) and you have understood lazy computation, you can expect that in certain cases .peek will be invoked for every element passing down the stream channel and no element would be skipped, and in other cases you would know which elements are to be skipped from .peek.

But extremely caution if you use .peek with parallel streams since there exists another set of traps which can arise. As the java API for .peek mentions:

For parallel stream pipelines, the action may be called at * whatever time and in whatever thread the element is made available by the * upstream operation. If the action modifies shared state, * it is responsible for providing the required synchronization.

Panagiotis Bougioukos
  • 15,955
  • 2
  • 30
  • 47
  • 3
    See the other answers as when to `peek` will be skipped due to stream optimizations. Your examples only explain lazy evaluation. `findFirst()` will only evaluate the items until the first one is found, this is independent of peek. For instance `Stream.of("A", "B", "C", "D").peek(System.out::println).count()` will "consume" the full stream (terminal operation), _but_ the items won't be printed (peek is optimized out); cf. [Stream#count](https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/util/stream/Stream.html#count()) – knittl Aug 24 '22 at 19:40
  • 1
    @knittl No this is just the way `.count` is working and has nothing to do with `.peek`. What you mention in the above comment is wrong : `will "consume" the full stream (terminal operation`. As the documentation states: `An implementation may choose to not execute the stream pipeline (either sequentially or in parallel) if it is capable of computing the count directly from the stream source. In such cases no source elements will be traversed and no intermediate operations will be evaluated.` So in that case even if `map` operator was used no elements would have passed from that operator too. – Panagiotis Bougioukos Aug 24 '22 at 21:08
  • 3
    @knittl the above example which you posted in your comment you could try it again with `Stream.of("A", "B", "C", "D").map(a ->{System.out.println(a);return a; }).count();`. You will see that also the map operator is not invoked. This is not an optimization related with peek, but some specific way the terminal operator `count` is able to work. – Panagiotis Bougioukos Aug 24 '22 at 21:22
  • @knittl Appart from that, this question is rooted to the Sonar issue https://rules.sonarsource.com/java/RSPEC-3864. And the Sonar example is exactly related with lazy computation – Panagiotis Bougioukos Aug 24 '22 at 21:29
  • 3
    There is indeed no difference between these types of intermediate stages regarding whether they might get optimized away or not. It’s worth noting that these stages might also perform more work than naïvely expected, e.g. `IntStream.iterate(1, i -> i + 1).parallel() .map(i -> { System.out.println("map " + i); return i; }) .peek(i -> System.out.println("peek " + i)) .anyMatch(i -> i == 2);` But there is one difference, though; the other operations are not supposed to have side effects, so it shouldn’t matter which elements are processed. But `peek` can only operate through side effects. – Holger Aug 25 '22 at 13:37