How to perform Stream functions on an Iterable?

Question

In Java 8, the Stream class does not have any method to wrap a an Iterable.

Instead, I am obtaining the Spliterator from the Iterable and then obtaining a Stream from StreamSupport like this:

boolean parallel = true;

StreamSupport.stream(spliterator(), parallel)
                .filter(Row::isEmpty)
                .collect(Collectors.toList())
                .forEach(this::deleteRow);

Is there some other way of generating Stream operations on an Iterable that I am missing?

What's the problem with your way of doing? Most Iterables are instances of Collection, and Collection has stream() and parallelStream(). — JB Nizet, Dec 01 '13 at 08:33
So the question is why `stream()` is not pulled up into Iterable? — Andrey Chaschev, Dec 01 '13 at 08:58
Answer here: http://stackoverflow.com/questions/23114015/why-does-iterablet-not-provide-stream-and-parallelstream-methods?lq=1 — Brian Goetz, Jun 01 '14 at 02:10

score 37 · Accepted Answer · answered Apr 04 '14 at 18:21

My similar question got marked as duplicate, but here is the helper methods I've used to avoid some of the boilerplate:

public static <T> Stream<T> stream(Iterable<T> in) {
    return StreamSupport.stream(in.spliterator(), false);
}

public static <T> Stream<T> parallelStream(Iterable<T> in) {
    return StreamSupport.stream(in.spliterator(), true);
}

score 1 · Answer 2 · edited May 23 '17 at 12:10

1

What you describe is the way to get a stream from an Iterable. That's why they added the spliterator() method to Iterable. I've done the same conversion myself and have not seen another way.

[UPDATE] Maybe this other answer will shed some clarification on the "why."

edited May 23 '17 at 12:10

Community

1
1

answered Dec 01 '13 at 10:38

Jason

7,356
4
41
48

Why do they make us suffer ... :P – The Coordinator Dec 01 '13 at 14:50
5

@SaintHill because java 8's unofficial moto is "by pedants, for pedants" I'm kind of afraid it is the direct result of being open sourced. That, and taking five years to ship. – Aleksandr Dubinsky Dec 18 '13 at 14:24
2

LOL. I am still miffed they couldn't have made life easier for developers by adding sensible defaults like `toList` without straining my fingers to type .collect(Collectors.toList()) !!!! – The Coordinator Dec 20 '13 at 14:12
2

I think the intention was that you would statically import the Collectors, so you would only have to type .collect(toList()). Having too many things in the public API can somewhat become noise. I would not have liked to see all of the Collectors.* attached to the stream API. – dsingleton Mar 27 '14 at 00:09
4

@Jay Your link to the other answer exlpains a different question, which is why `Stream` is not itself an `Iterable`. The rationale for that does not apply here. – Marko Topolnik Apr 05 '14 at 12:03

score 1 · Answer 3 · answered Mar 27 '14 at 00:02

1

I know this doesn't directly answer your question, but a decent number of Iterable sources such as collections now have a method to get the object as a stream as well.

I think that the friction that you will run into with this question is that Iterable is semantically serial whereas Spliterators are meant to be used for processing in parallel. It is probably a better idea to implement a Spliterator for the underlying data source that you are interested in if it is not already provided in the JDK because just using a wrapper around the Iterable will not allow you to gain the benefits that the Stream API provide (such as parallel processing).

answered Mar 27 '14 at 00:02

dsingleton

976
6
7

2

But what is the logic behind allowing `collection.stream()`, but not `iterable.stream()`, while at the same time providing `iterable.spliterator()`? Your answer would apply if `Iterable` did not provide `spliterator()`. – Marko Topolnik Apr 05 '14 at 12:04
I'm guessing that it is there to discourage but not prohibit. You could mark a stream as parallel that was backed by an iterator, but you probably wouldn't see the benefits. – dsingleton Apr 07 '14 at 21:28
Also, look at the implNote that they put on the spliterator method for iterator @implNote The default implementation should usually be overridden. The spliterator returned by the default implementation has poor splitting capabilities, is unsized, and does not report any spliterator characteristics. Implementing classes can nearly always provide a better implementation. – dsingleton Apr 07 '14 at 21:29
a) there are many more benefits to using streams than parallel computing; b) of course the default spliterator is what you say, and of course it should be overridden in a sized Collection. But implementation notes have nothing to do with interface definition. I am deprived of a `stream` method on an arbitrary `Iterable` and have to hack my way through to it. – Marko Topolnik Apr 08 '14 at 07:41
Also note that for a truly unsized source such as CSV body of an HTTP request, a `ResultSet`, or an `BufferedReader`, the default implementation is almost the best you can get, and *does* permit full-blown parallelism. It is only bad in the context of sized streams. – Marko Topolnik Apr 08 '14 at 07:43
a) Yeah, I know. I was just using it to make a point. It doesn't change the fact that it still does not work very well for that use case. b) It's not what I say, it is the documentation. c) Hack your way through it? You provide the spliterator to the StreamSupport class. That's not that hard. You are right, it is not a first class method to the Iterable class, but it is not like you are being asked to do a lot of extra work. It could have been an oversight that they chose not to provide it, or they could have deliberately chose not to, I don't know. – dsingleton Apr 08 '14 at 15:00
Usually, people tend to be curious *why* it was deliberately left out. They may also learn something from it. I am one of those people. – Marko Topolnik Apr 08 '14 at 19:04
The first point of your answer is in fact yet another argument *in favor* of `Iterable#stream`. But, as it stands, you have to manually check if the passed-in `Iterable` happens to be one of those subtypes, in order to leverage the `stream` method it provides. – Marko Topolnik Apr 08 '14 at 19:07
1

`Iterable#spliterator` does provide ("limited", as they say) parallelism which is actually quite good -- indeed, excellent -- for many pipelines. This fact mostly discredits your second point about `Iterable` being "semantically serial". Your last sentence culminates in an outright falsehood: I have personally been using `Iterable`'s *default* Spliterator to achieve very good parallelism. – Marko Topolnik Apr 08 '14 at 19:28
I am not saying that I am not curious. I was just pointing out to the person asking the question a potential cause or reason why it was left out. I will admit, when I wrote this, I was thinking in terms of a sized data source, in which case I do think that my statement still holds true. And iterable _is_ semantically serial. Spliterator is not. Iterators serially traverse data. There are ways of using them in parallel, but that is by splitting the data and traversing in a serial fashion on subsets, which does not change the semantics of the Iterable. – dsingleton Apr 08 '14 at 22:42

How to perform Stream functions on an Iterable?

3 Answers3

Linked

Related