Is it a good idea to substitute Collection for Stream in return values?

Question

Up until Java 8, a property representing a collection of elements usually returned a Collection. At the absence of an immutable collection interface, a common idiom would be to wrap it as:

Collection<Foo> getFoos(){ return Collections.unmodifiableCollection(foos); }

Now that Stream is here, it is tempting to start exposing Streams instead of Collections.

The benefits as I see them:

A truly immutable API
Most often than not, the client of such a property is interested in querying or iterating the result (It would be really terrible if it wanted to make updates to the collection..).

On the other hand, Streams can be consumed only once, and cannot be passed around like regular collections. This is particularly worrisome.

This question is different from a similar looking question since it is broader in the sense that the OP there explicitly stated that the streams he intends to return are not going to be passed around. In my opinion this aspect was not addressed in the answers to the original question.

To put in other words: it seems to me that if an API returns a stream, the general mindset should be that all interaction with it must terminate in the immediate context. It should be forbidden to pass the stream around.

But, it seems like this is very hard to enforce, unless developers are very familiar with the Stream API. This implies that this kind of API requires a paradigm shift. Am I right about this assertion?

Please consider to reopen the question- I explained why I think it is relevant. — Vitaliy, Feb 16 '15 at 19:10

score 3 · Accepted Answer · answered Feb 19 '15 at 23:20

Let me propose a simple rule:

A Stream that is passed as a method argument or returned as a method's return value must be the tail of an unterminated pipeline.

This is probably so obvious to those of us who have worked on streams that we never bothered to write it down. But it's probably not obvious to people approaching streams for the first time, so it's likely worth a discussion.

The main rule is covered in the Streams API package documentation: a stream can have at most one terminal operation. Once it's been terminated, it's illegal to add any intermediate or terminal operations.

The other rule is that stream pipelines must be linear; they cannot have branches. This isn't terribly clearly documented, but it is mentioned in the Stream class documentation about two-thirds of the way down. This means that it's illegal to add an intermediate or terminal operation to a stream if it isn't the last operation on the pipeline.

Most of the stream methods are either intermediate or terminal operations. If you attempt to use one of these on a stream that's terminated or that's not the last operation, you find out pretty quickly by getting an IllegalArgumentException. This does happen occasionally, but I think that once people get the idea that a pipeline has to be linear, they learn to avoid this issue, and the problem goes away. I think this is pretty easy for most people to grasp; it shouldn't require a paradigm shift.

Once you understand this, it's clear that if you're going to hand a Stream instance to another piece of code -- either by passing it as an argument, or returning it to the caller -- it needs to be a stream source or the last intermediate operation in a pipeline. That is, it needs to be the tail of an unterminated pipeline.

To put in other words: it seems to me that if an API returns a stream, the general mindset should be that all interaction with it must terminate in the immediate context. It should be forbidden to pass the stream around.

I think this is too restrictive. As long as you adhere to the rule I proposed, you should be free to pass the stream around as much as you want. Indeed, there are a bunch of use cases for getting a stream from somewhere, modifying it, and passing it along. Here are a couple examples.

1) Open a text file containing the textual representation of a POJO on each line. Call File.lines() to get a Stream<String>. Map each line into a POJO instance, and return a Stream<POJO> to the caller. The caller might apply a filter or a sort operation and return the stream to its caller.

2) Given a Stream<POJO>, you might want to have a web interface to allow the user to provide a complex set of search criteria. (For example, consider a shopping site with lots of sorting and filtering options.) Instead of composing a big complex pipeline in code, you might have a method like the following:

Stream<POJO> applyCriteria(Stream<POJO>, SearchCriteria)

which would take a stream, apply the search criteria by appending various filters, and possibly sort or distinct operations, and return the resulting stream to the caller.

From these examples, I hope you can see that there is considerable flexibility in passing streams around, as long as what you pass around is always the tail of an unterminated pipeline.

The rules you propose are absolutely sensible, the problem is that it increases mental burden. In the past, where everybody simply worked with Collections, developers never had to think twice before calling a method. They did not have to tread carefully around it. Moreover, it is often the case where given a sequence, it is convenient to traverse it more than once a few lines of code apart. Now, seemingly harmless operations suddenly become illegal, and for reasons that are not immediately apparent. This relates to our previous correspondence about the once-off nature of streams. (next comnt) — Vitaliy, Mar 02 '15 at 18:52
In a world where everybody is aware of this, and devs do not get tired, and always pay careful attention to what they write- this can be pulled off. But I feel that in our day to day reality- this is a recipe for annoying bugs, or even worse- emerging idioms that pass along a means to regenerate the stream at each point to achieve such functionality. Or even wrapper streams that simply cancel out the once off nature of streams, thus going against the language intentions. Sturat, is there a way we could converse in a more direct manner (email?), I feel this platform is too restricting.. — Vitaliy, Mar 02 '15 at 18:56
I've created a chat room for this: http://chat.stackoverflow.com/rooms/72090/return-collection-or-stream — Stuart Marks, Mar 02 '15 at 20:50
@Vitaliy Whoops, I forgot to @-notify you in my comment yesterday. Trying again. — Stuart Marks, Mar 04 '15 at 04:24
Put some stuff in the chat room, hope you did not forsaken me ;-) — Vitaliy, Mar 10 '15 at 08:10
@Vitaliy Hi, sorry, I didn't get a chat notification. I replied in chat. — Stuart Marks, Mar 10 '15 at 17:37

score 0 · Answer 2 · answered Feb 17 '15 at 07:49

It depends:

if you do return Streams from your methods, you always need to make sure that they are not already closed when returning.

Using Streams in your applications API will increase the probability that the users of your application will also pass around Streams instead of Collections - which implies that they also need to keep in mind that they shouldn't return already closed Streams.

In private projects using Streams would probably work, but if you're building a public API I would not consider returning Streams as a good idea.

Personally, I prefer using Iterables in favour of Collections because of their immutability. I've created a wrapper called Enumerables to extend Iterable with a similar functional API that Stream has.

Is it a good idea to substitute Collection for Stream in return values?

2 Answers2