12

I have a situation where I am reading a database and returning a List<String>, where each string is selected and added to the list according to some criteria. The method signature is:

public List<String> myMethod(String query, int limit)

The second parameter provides an upper bound on the size of the returned list (setting limit=-1 will remove any size restriction). To avoid making this method memory-intensive, I have written an equivalent method that returns Stream<String> instead of a list. ( Note: I don't need random access to the returned elements or any other list-specific functionality. )

However, I am a bit skeptical about returning a Stream<>, especially since the method is public. Is it safe to have a public method returning a Stream<> in Java?

Luiggi Mendoza
  • 85,076
  • 16
  • 154
  • 332
Chthonic Project
  • 8,216
  • 1
  • 43
  • 92
  • 2
    Under some circumstances it can be clearer and easier to return an Iterable/Iterator. I often build an `Iterator` in my code. Making a `Stream` from an `Iterator` is quite [simple](http://stackoverflow.com/q/21956515/823393). – OldCurmudgeon Jan 15 '15 at 16:48
  • 2
    @OldCurmudgeon If you return `Stream`, you get `Iterator` almost for free (`stream.iterator()`). – Marko Topolnik Jan 15 '15 at 16:53
  • @Marko - Agreed - but you are still backwards-compatible with Java 7 and earlier if you then offer a simple adapter to stream the `Iterator` in Java 8. – OldCurmudgeon Jan 15 '15 at 16:55
  • 2
    @OldCurmudgeon `Iterator` is not `Closeable` and that causes quite some headaches for I/O-backed iterators. – Marko Topolnik Jan 15 '15 at 17:00
  • 3
    Seems to be similar to [“Should I be exposing Stream on my interface?”](http://stackoverflow.com/q/27179175/2711488) and [“Should I return a Collection or a Stream?”](http://stackoverflow.com/q/24676877/2711488) – Holger Jan 15 '15 at 19:47

2 Answers2

14

Not only is it safe, it is recommended by the chief Java architect.

Especially if your data is I/O-based and thus not yet materialized in memory at the time myMethod is called, it would be highly advisable to return a Stream instead of a List. The client may need to only consume a part of it or aggregate it into some data of fixed size. Thus you have the chance to go from O(n) memory requirement to O(1).

Note that if parallelization is also an interesting idea for your use case, you would be advised to use a custom spliterator whose splitting policy is adapted to the sequential nature of I/O data sources. In this case I can recommend a blog post of mine which presents such a spliterator.

Community
  • 1
  • 1
Marko Topolnik
  • 195,646
  • 29
  • 319
  • 436
  • 1
    Thank you for the link to Brian Goetz' answer. I have just one remaining doubt about safety: returning a `Stream<>` in a public method means that I can't ensure it will be closed. I noticed that the [Javadoc](http://docs.oracle.com/javase/8/docs/api/java/util/stream/Stream.html) says "nearly all stream instances do not actually need to be closed after use", but wasn't able to find more details on that suspicious word 'nearly'. – Chthonic Project Jan 15 '15 at 16:52
  • 2
    @ChthonicProject Well, it's very simple: if your stream is backed by an I/O resource, then it definitely needs closing. This is a choice you must make---eagerly copy everything onto heap, releasing the I/O resource, or have a stream which needs to be closed. But I would argue for the following: it is trivial to have a method which safely converts an I/O-backed stream into an on-heap List; the other direction is impossible. So you add nothing but flexibility by going with the Stream. – Marko Topolnik Jan 15 '15 at 16:54
  • 1
    Didn't notice the important sentence that follows in the same paragraph in the Javadoc. Sharing it here in order to spread the joy: " *If a stream does require closing, it can be declared as a resource in a try-with-resources statement.* That, together with your comment (and of course, the main answer), ends all my doubts and hesitations. – Chthonic Project Jan 15 '15 at 17:03
  • 2
    Yes, streams are `AutoCloseable`. – Marko Topolnik Jan 15 '15 at 17:04
0

I believe that as a default, you should avoid Stream in your public method interfaces, because they are dangerous to consume, see How to safely consume Java Streams be safely without isFinite() and isOrdered() methods?

Basically a client calling your method and getting the stream will have to make sure that when your method implementation changes the characteristics of the returned streams, their algorithm does not break (or breaks in their integration tests). That is a very difficult thing to do (because the stream characteristics are easy to forget) and an easy thing to forget.

So I would only even consider the Stream as a return value if the data that you return is not materialized yet and you want to leave it to your clients to decide how to materialize. But even then, an Iterable or an Iterator seem like better choices because they come without the unnecessary parallel processing baggage that streams have, and that defensive programming needs to guard against.

As an example, when returning a List, your clients know that the returned datatype is finite and ordered, and iterating on it will not surprisingly run in parallel on the ForkJoinPool possibly breaking your whole application. With Stream, you have to call sequential() to guard against this possibility.

If the data source needs closing after consumption, I would prefer a variant of InputStream over Stream, because implementors will remember well that they need to close the stream (and static checkers will remind them).

tkruse
  • 10,222
  • 7
  • 53
  • 80