214

I'm reading up about Java streams and discovering new things as I go along. One of the new things I found was the peek() function. Almost everything I've read on peek says it should be used to debug your Streams.

What if I had a Stream where each Account has a username, password field and a login() and loggedIn() method.

I also have

Consumer<Account> login = account -> account.login();

and

Predicate<Account> loggedIn = account -> account.loggedIn();

Why would this be so bad?

List<Account> accounts; //assume it's been setup
List<Account> loggedInAccount = 
accounts.stream()
    .peek(login)
    .filter(loggedIn)
    .collect(Collectors.toList());

Now as far as I can tell this does exactly what it's intended to do. It;

  • Takes a list of accounts
  • Tries to log in to each account
  • Filters out any account which aren't logged in
  • Collects the logged in accounts into a new list

What is the downside of doing something like this? Any reason I shouldn't proceed? Lastly, if not this solution then what?

The original version of this used the .filter() method as follows;

.filter(account -> {
        account.login();
        return account.loggedIn();
    })
Mike G
  • 4,232
  • 9
  • 40
  • 66
Adam.J
  • 2,519
  • 3
  • 14
  • 12
  • 71
    Any time I find myself needing a multi-line lambda, I move the lines to a private method and pass the method reference instead of the lambda. – VGR Nov 10 '15 at 17:18
  • Yeah I understand this. I was just trying to more clearly demonstrate what I'm trying to achieve. Thanks though :) – Adam.J Nov 10 '15 at 17:35
  • 1
    What's the intent - are you trying to log all accounts in *and* filter them based on if they're logged in (which may be trivially true)? Or, do you want to log them in, *then* filter them based on whether or not they've logged in? I'm asking this in this order because `forEach` may be the operation you want as opposed to `peek`. Just because it's in the API doesn't mean it's not open for abuse (like `Optional.of`). – Makoto Nov 10 '15 at 17:43
  • 1
    Filter based in if they have actually logged in. For example if the username is wrong it won't log in. So I then want to check if it is or isn't logged in. If it's not then it'll get tossed by the filter. – Adam.J Nov 10 '15 at 17:46
  • 13
    Also note that your code could just be `.peek(Account::login)` and `.filter(Account::loggedIn)`; there's no reason to write a Consumer and Predicate that just calls another method like that. – Joshua Taylor Nov 10 '15 at 21:32
  • 2
    Also note that the stream API [explicitly discourages side-effects](https://docs.oracle.com/javase/8/docs/api/java/util/stream/package-summary.html#Statelessness) in _behavioural parameters_. – Didier L Nov 10 '15 at 22:27
  • @DidierL Okay so would the following be discourage? Consumer login = account -> getWebsite(account.getUrl()).login(account.getUsername(), account.getPassword()) – Adam.J Nov 10 '15 at 23:41
  • 8
    Useful consumers always have side-effects, those are not discouraged of course. This is actually mentioned in the same section: “_A small number of stream operations, such as `forEach()` and `peek()`, can operate only via side-effects; these should be used with care._”. My remark was more to remind that the `peek` operation (which is designed for debugging purposes) should not be replaced by doing the same thing inside another operation like `map()` or `filter()`. – Didier L Nov 10 '15 at 23:53
  • @DidierL Okay thanks. So I would call forEach(login) on the accounts. Then when I want to do something with the logged in accounts filter using a loggedIn predicate? Starting to make more sense now. Thanks – Adam.J Nov 10 '15 at 23:55
  • 1
    Yep, exactly as in @Makoto's answer :-) – Didier L Nov 10 '15 at 23:59
  • A kitchen knife can kill a person, but you don't go to a war with that! :). Java 8 streams are cool. But, try not to achieve everything using streams when regular loops can do better. A `forEach` would be more intuitive here. We are so obsessed with java 8 streams and try to solve everything using them. I observe lot of people(including me) try to achieve something like the above using `peek()` thought the API clearly mentions not to use it. – Diablo Nov 20 '20 at 16:52

10 Answers10

159

The important thing you have to understand is that streams are driven by the terminal operation. The terminal operation determines whether all elements have to be processed or any at all. So collect is an operation that processes each item, whereas findAny may stop processing items once it encountered a matching element.

And count() may not process any elements at all when it can determine the size of the stream without processing the items. Since this is an optimization not made in Java 8, but which will be in Java 9, there might be surprises when you switch to Java 9 and have code relying on count() processing all items. This is also connected to other implementation-dependent details, e.g. even in Java 9, the reference implementation will not be able to predict the size of an infinite stream source combined with limit while there is no fundamental limitation preventing such prediction.

Since peek allows “performing the provided action on each element as elements are consumed from the resulting stream”, it does not mandate processing of elements but will perform the action depending on what the terminal operation needs. This implies that you have to use it with great care if you need a particular processing, e.g. want to apply an action on all elements. It works if the terminal operation is guaranteed to process all items, but even then, you must be sure that not the next developer changes the terminal operation (or you forget that subtle aspect).

Further, while streams guarantee to maintain the encounter order for a certain combination of operations even for parallel streams, these guarantees do not apply to peek. When collecting into a list, the resulting list will have the right order for ordered parallel streams, but the peek action may get invoked in an arbitrary order and concurrently.

So the most useful thing you can do with peek is to find out whether a stream element has been processed which is exactly what the API documentation says:

This method exists mainly to support debugging, where you want to see the elements as they flow past a certain point in a pipeline

Holger
  • 285,553
  • 42
  • 434
  • 765
  • 1
    will there be any problem, future or present, in OP's use case? Does his code always do what he wants? – ZhongYu Nov 10 '15 at 18:06
  • 11
    @bayou.io: as far as I can see, there is no problem in *this exact form*. But as I tried to explain, using it this way implies you have to remember this aspect, even if you come back to the code one or two years later to incorporate «feature request 9876» into the code… – Holger Nov 10 '15 at 18:09
  • 1
    "the peek action may get invoked in an arbitrary order and concurrently". Doesn't this statement go against their rule for how peek works, e.g. "as elements are consumed"? – Jose Martinez Jun 30 '17 at 16:17
  • 6
    @Jose Martinez: It says “as elements are consumed *from the resulting stream*”, which isn’t the terminal action but the processing, though even the terminal action could consume elements out of order as long as the final result is consistent. But I also think, the phrase of the API note, “*see the elements as they flow past a certain point in a pipeline*” does a better job at describing it. – Holger Jul 03 '17 at 10:42
110

The key takeaway from this:

Don't use the API in an unintended way, even if it accomplishes your immediate goal. That approach may break in the future, and it is also unclear to future maintainers.


There is no harm in breaking this out to multiple operations, as they are distinct operations. There is harm in using the API in an unclear and unintended way, which may have ramifications if this particular behavior is modified in future versions of Java.

Using forEach on this operation would make it clear to the maintainer that there is an intended side effect on each element of accounts, and that you are performing some operation that can mutate it.

It's also more conventional in the sense that peek is an intermediate operation which doesn't operate on the entire collection until the terminal operation runs, but forEach is indeed a terminal operation. This way, you can make strong arguments around the behavior and the flow of your code as opposed to asking questions about if peek would behave the same as forEach does in this context.

accounts.forEach(a -> a.login());
List<Account> loggedInAccounts = accounts.stream()
                                         .filter(Account::loggedIn)
                                         .collect(Collectors.toList());
Makoto
  • 104,088
  • 27
  • 192
  • 230
  • 6
    If you perform the login in a preprocessing step, you don’t need a stream at all. You can perform `forEach` right at the source collection: `accounts.forEach(a -> a.login());` – Holger Nov 10 '15 at 17:59
  • 1
    @Holger: Excellent point. I've incorporated that into the answer. – Makoto Nov 10 '15 at 18:00
  • Great answer! I was torn between this and what @Holger said. Both convey the same thing, but this seems a little clearer and has a nice example too. I'll follow this approach, even though I doubt there will be another developer using this, I myself may forget the intent of the function. – Adam.J Nov 10 '15 at 19:24
  • 2
    @Adam.J: Right, my answer focused more on the general question contained in your title, i.e. is this method really only for debugging, by explaining the aspects of that method. This answer is more fused on your actual use case and how to do it instead. So you could say, together they provide the full picture. First, the reason why this is not the intended use, second the conclusion, not to stick to an unintended use and what to do instead. The latter will have more practical use for you. – Holger Nov 10 '15 at 19:43
  • Yeah I can see that. Just a shame I can't accept bother answers hah. One this is for sure, both answers have definitely helped me understand why not to do it and how to achieve the same thing, in a more practical manor. – Adam.J Nov 10 '15 at 19:52
  • 1
    This works in OP's case, where the stream can be easily recreated. However, if the stream wasn't based on a collection, this approach wouldn't work so well. You could make an argument that if the stream is only available once, a semantically better way to write this would be `.map(a -> { account.login(); return account; })`, but that does seem a bit more verbose than a simple `.peek(Account::login)`. – Joshua Taylor Nov 10 '15 at 21:36
  • 1
    @JoshuaTaylor: If it's not a collection or an array, you're doing something with I/O and you're attempting to act on it in a way that was probably not intended. The main takeaway from this (besides the bolded one) would be to perform the transformation operation *before* you filter. Needing to transform *and* filter in the same step may be seen as an edge case or a code smell, and I'd lean more towards the latter. – Makoto Nov 10 '15 at 22:13
  • 1
    @Makoto " If it's not a collection or an array, you're doing something with I/O and you're attempting to act on it in a way that was probably not intended. " That's not necessarily true at all. I can write a method that takes a `Stream<...>` and then call it with `myCollection.stream()` without doing anything that's IO based. If the streams API is supposed to be useful, there's no real reason why methods might not take streams as arguments. – Joshua Taylor Nov 10 '15 at 22:18
  • 2
    Of course, it was much easier if the `login()` method returned a `boolean` value indicating the success status… – Holger Nov 11 '15 at 07:59
  • @Holger I was wondering if this was the correct way to do it or not? I have a few void methods which I later need to check they're success. I wasn't sure if this should be two methods or one which returned a boolean. Thanks for this. – Adam.J Nov 11 '15 at 09:25
  • @Holger Also, could I make login a predicate? It would log in, then return loggedIn? – Adam.J Nov 11 '15 at 10:38
  • 3
    That’s what I was aiming at. If `login()` returns a `boolean`, you can use it as a predicate which is the cleanest solution. It still has a side effect, but that’s ok as long as it is non-interfering, i.e. the `login` process` of one `Account` has no influence on the login` process` of another `Account`. – Holger Nov 11 '15 at 11:13
  • @Holger Thanks! That seems like the better solution! – Adam.J Nov 11 '15 at 11:19
  • How is this the accepted answer? We are iterating over the collection twice while technically only one iteration is required. I am currently facing a similar issue and am considering usage of a classic forEach Loop. What advantage does the usage of two streams have here? – Lukas Mar 28 '17 at 07:57
  • @Lukas: One stream introduces side-effects, and the other doesn't. If you blend streams which have side-effects in with streams that *shouldn't*, it can be come trickier to debug and maintain going forward. This is for clarity's sake. At the same time, I'd doubt there's much in terms of performance win by iterating over it once; you're still doing the same operations on it, with the second operation simply filtering out the unsuccessfully authenticated accounts. – Makoto Mar 28 '17 at 17:22
  • 1
    Obviously this solution is not good as it is forcing a functional way of writing where a non functional code makes so much more sense. Either the login function should return a logged in Account object or at least a boolean or the classic for each loop is a cleaner way to write it. If you have to iterate twice over the same collection then your code is broken. Please don't use the stream API for the sake of using it. – kaba713 Jul 03 '18 at 07:56
34

Perhaps a rule of thumb should be that if you do use peek outside the "debug" scenario, you should only do so if you're sure of what the terminating and intermediate filtering conditions are. For example:

return list.stream().map(foo->foo.getBar())
                    .peek(bar->bar.publish("HELLO"))
                    .collect(Collectors.toList());

seems to be a valid case where you want, in one operation to transform all Foos to Bars and tell them all hello.

Seems more efficient and elegant than something like:

List<Bar> bars = list.stream().map(foo->foo.getBar()).collect(Collectors.toList());
bars.forEach(bar->bar.publish("HELLO"));
return bars;

and you don't end up iterating a collection twice.

chimera8
  • 349
  • 3
  • 2
  • 4
    Iterating twice is O(2n) =~ O(n). The chances of you having a performance problem because of this are minimal. You do however score on clarity if you don't use peek. – GauravJ Nov 03 '20 at 23:49
10

A lot of answers made good points, and especially the (accepted) answer by Makoto describes the possible problems in quite some detail. But no one actually showed how it can go wrong:

[1]-> IntStream.range(1, 10).peek(System.out::println).count();
|  $6 ==> 9

No output.

[2]-> IntStream.range(1, 10).filter(i -> i%2==0).peek(System.out::println).count();
|  $9 ==> 4

Outputs numbers 2, 4, 6, 8.

[3]-> IntStream.range(1, 10).filter(i -> i > 0).peek(System.out::println).count();
|  $12 ==> 9

Outputs numbers 1 to 9.

[4]-> IntStream.range(1, 10).map(i -> i * 2).peek(System.out::println).count();
|  $16 ==> 9

No output.

[5]-> Stream.of(1, 2, 3, 4, 5, 6, 7, 8, 9).peek(System.out::println).count();
|  $23 ==> 9

No output.

[6]-> Arrays.asList(1, 2, 3, 4, 5, 6, 7, 8, 9).stream().peek(System.out::println).count();
|  $25 ==> 9

No output.

[7]-> IntStream.range(1, 10).filter(i -> true).peek(System.out::println).count();
|  $30 ==> 9

Outputs numbers 1 to 9.

[1]-> List<Integer> list = new ArrayList<>();
|  list ==> []
[2]-> Stream.of(1, 5, 2, 7, 3, 9, 8, 4, 6).sorted().peek(list::add).count();
|  $7 ==> 9
[3]-> list
|  list ==> []

(You get the idea.)

The examples were run in jshell (Java 15.0.2) and mimic the use case of converting data (replace System.out::println by list::add for example as also done in some answers) and returning how much data was added. The current observation is that any operation that could filter elements (such as filter or skip) seems to force handling of all remaining elements, but it need not stay that way.

René
  • 904
  • 13
  • 26
  • I'm not sure your results are reliable. As the `.count` terminal operation also produces output, JShell might be overwriting the output of the .peek operation with it. If you replace .count with another terminal operation that doesn't produce output, it works nicely, e.g. `jshell> IntStream.range(1,10).peek(System.out::println).forEach(i->{})`. – ThomasH Apr 23 '21 at 05:28
  • 2
    Having count as the terminal operation here is exactly the problem I want to show. count is not interested in your actual elements which is why they are sometimes not processed and the count just calculated. – René Apr 23 '21 at 14:22
  • Ah, ok, now I see. – ThomasH Apr 23 '21 at 19:10
  • Just for anyone wondering how does the `count()` method works without actually calculating number of elements in the stream, I strongly believe it is because of flag `StreamOpFlag.SIZED` being set by `IntStream` and `Stream.of`. What is worse, `Stream.of` behaviour differs between JVM versions: in 1.8 it used to be a normal stream, but in some latter version it became `SIZED`, iirc. – Xobotun Feb 18 '22 at 09:26
8

Although I agree with most answers above, I have one case in which using peek actually seems like the cleanest way to go.

Similar to your use case, suppose you want to filter only on active accounts and then perform a login on these accounts.

accounts.stream()
    .filter(Account::isActive)
    .peek(login)
    .collect(Collectors.toList());

Peek is helpful to avoid the redundant call while not having to iterate the collection twice:

accounts.stream()
    .filter(Account::isActive)
    .map(account -> {
        account.login();
        return account;
    })
    .collect(Collectors.toList());
UltimaWeapon
  • 690
  • 9
  • 8
  • 5
    All you have to do is to get that login method right. I really don't see how the peek is the cleanest way to go. How should someone who is reading your code know that you actually misusing the API. Good and clean code doesn't force a reader to make assumptions about the code. – kaba713 Jul 03 '18 at 08:01
  • I think you need to qualify the method reference in the `.peek` operation, e.g. as `Account::login`, for it to work. – ThomasH Apr 23 '21 at 05:02
  • 1
    I agree that using `.peek` in favor of the `.map` alternative is more succinct, expressive and understandable. The lambda in the .map is only necessary to return the passed in object. .peek does this on its own. I know that as soon as I read the operation name, and don't have to inspect the lambda to find it out. – ThomasH Apr 23 '21 at 05:12
8

I would say that peek provides the ability to decentralize code that can mutate stream objects, or modify global state (based on them), instead of stuffing everything into a simple or composed function passed to a terminal method.

Now the question might be: should we mutate stream objects or change global state from within functions in functional style java programming?

If the answer to any of the the above 2 questions is yes (or: in some cases yes) then peek() is definitely not only for debugging purposes, for the same reason that forEach() isn't only for debugging purposes.

For me when choosing between forEach() and peek(), is choosing the following: Do I want pieces of code that mutate stream objects (or change global state) to be attached to a composable, or do I want them to attach directly to stream?

I think peek() will better pair with java9 methods. e.g. takeWhile() may need to decide when to stop iteration based on an already mutated object, so paring it with forEach() would not have the same effect.

P.S. I have not mentioned map() anywhere because in case we want to mutate objects (or global state), rather than generating new objects, it works exactly like peek().

Marinos An
  • 9,481
  • 6
  • 63
  • 96
  • 1
    **Not true at all.** According to its JavaDocs, the intermediate Stream operation java.util.Stream.peek() “exists mainly to support debugging” purposes. – Hannes Schneidermayer Jun 08 '22 at 14:38
7

Despite the documentation note for .peek saying the "method exists mainly to support debugging" I think it has general relevance. For one thing the documentation says "mainly", so leaves room for other use cases. It is not deprecated since years, and speculations about its removal are futile IMO.

I would say in a world where we still have to handle side-effectful methods it has a valid place and utility. There are many valid operations in streams that use side-effects. Many have been mentioned in other answers, I'll just add here to set a flag on a collection of objects, or register them with a registry, on objects which are then further processed in the stream. Not to mention creating log messages during stream processing.

I support the idea to have separate actions in separate stream operations, so I avoid pushing everything into a final .forEach. I favor .peek over an equivalent .map with a lambda who's only purpose, besides calling the side-effect method, is to return the passed in argument. .peek tells me that what goes in also goes out as soon as I encounter this operation, and I don't need to read a lambda to find out. In that sense it is succinct, expressive and improves readability of the code.

Having said that I agree with all the considerations when using .peek, e.g. being aware of the effect of the terminal operation of the stream it is used in.

ThomasH
  • 22,276
  • 13
  • 61
  • 62
2

The functional solution is to make account object immutable. So account.login() must return a new account object. This will mean that the map operation can be used for login instead of peek.

Solubris
  • 3,603
  • 2
  • 22
  • 37
1

To get rid of warnings, I use functor tee, named after Unix' tee:

public static <T> Function<T,T> tee(Consumer<T> after) {
    return arg -> {
        f.accept(arg);
        return arg;
    };
}

You can replace:

  .peek(f)

with

  .map(tee(f))
Jens Jensen
  • 1,038
  • 1
  • 10
  • 20
0

It seems like a helper class is needed:

public static class OneBranchOnly<T> {
    public Function<T, T> apply(Predicate<? super T> test,
                                Consumer<? super T> t) {
        return o -> {
            if (test.test(o)) t.accept(o);
            return o;
        };
    }
}

then switch peek with map:

.map(new OneBranchOnly< Account >().apply(                   
                    account -> account.isTestAccount(),
                    account -> account.setName("Test Account"))
)

results: Collections of accounts that only test accounts got renamed (no reference gets maintained)

Jay Ehsaniara
  • 1,421
  • 17
  • 24