6

The Java Stream.forEach function has the serious limitation that it's impossible for its consumer to throw checked exceptions. As such, I would like to access a Stream's elements one by one.

I want to do something like this:

while(true) {
    Optional<String> optNewString = myStream.findAny();

    if (optNewString.isPresent())
        doStuff(optNewString.get());
    else
        break;
}

However, findAny is a short-circuiting terminal operation. That is, it closes the stream. This code would crash on the second iteration of the while loop. I cannot simply put all elements inside an array, and go over that array one by one, because there are potentially tens of millions of elements.

Please note I am not asking how to throw exceptions from within forEach. This question has already been answered.

Jeroen
  • 15,257
  • 12
  • 59
  • 102
  • 8
    You'll simply have to accept that checked exceptions were a bad idea, and don't fit well with streams and functional programming. So catch them and wrap them in runtime exceptions. – JB Nizet Nov 01 '17 at 17:57
  • @JBNizet Sure, that works, and is what I am currently doing using a `LambdaException` type. Just because it works doesn't mean it's well written though, and there is always room for improvements. Furthermore, not being able to wait for the next element of a stream seems to be an API hole. – Jeroen Nov 01 '17 at 17:58
  • [This](https://stackoverflow.com/questions/20129762/why-does-streamt-not-implement-iterablet) could be of use – jrtapsell Nov 01 '17 at 18:02
  • 4
    Any reason you don't want to use `iterator()` and just iterate that way? – Jon Skeet Nov 01 '17 at 18:02
  • @JonSkeet Not particularly, but `iterator()` does not mention its (spacial) complexity, and given it provides more functionality than a `Stream`, like removing elements, I am unsure on what to expect from it, especially from different types of streams. I am working with hundreds of millions of elements here, so if that ever does something similar to creating an `ArrayList` internally, I'm in trouble. – Jeroen Nov 01 '17 at 18:04
  • 2
    @JeroenBollen: But you wouldn't use that - you're just trying to iterate, one element at a time, aren't you? That's what an iterator is for. (I wouldn't expect stream iterators to support removal.) – Jon Skeet Nov 01 '17 at 18:05
  • @JonSkeet The stream iterator implements a lot of functionality that does not apply to `Stream`s. `hasNext`, `remove`, etc. – Jeroen Nov 01 '17 at 18:12
  • 2
    How does `hasNext` not apply to streams? Sure, `remove` isn't supported (I'd expect) but `hasNext` should be fine. Have you *tried* using `iterator()` to see whether it does what you want? – Jon Skeet Nov 01 '17 at 18:13
  • @JonSkeet A `Stream` produces elements, it does not iterate over elements. It does not know how many elements it will produce. It only knows if there is a next one, if the next one has been computed. I tried using `Stream.iterator` just now, but as I expected the memory usage was way too high. It seems to be collecting the `Stream` into memory and then iterating over the memory. – Jeroen Nov 01 '17 at 18:54
  • 2
    That surprises me, based on my experience of the nearest equivalent in .NET. I'd have *expected* it to iterate lazily etc. (I can't speak to your specific situation as we have no idea what kind of stream you're creating etc.) I wouldn't expect `Stream.forEach` to require the whole stream to be created in memory in one go either... – Jon Skeet Nov 01 '17 at 18:58
  • @JonSkeet Please note that the `Stream` is not being moved into memory unless `Stream.iterator` is called. With `Stream.iterator`, my memory usage went to 2.5GiB before I had to kill it. – Jeroen Nov 01 '17 at 19:01
  • 2
    It would help if you'd provide more details of your stream. I'm going to do some more experimentation, but this still surprises me a lot. – Jon Skeet Nov 01 '17 at 19:03
  • @JonSkeet My Stream produces lines from a file. There really isn't much to it. – Jeroen Nov 01 '17 at 19:11
  • 1
    That's not how `Stream.iterator` works. It actually doesn't fill any in-memory data structure, so @Jon is right here. Nevertheless, it all depends on the source of the stream. Check there for high memory consumption. – fps Nov 01 '17 at 19:12
  • @FedericoPeraltaSchaffner I am not getting high memory usage when using `forEach`, so I am puzzled what else could be causing high memory usage if it isn't `iterator`. – Jeroen Nov 01 '17 at 19:16
  • 1
    @JeroenBollen: https://gist.github.com/jskeet/f656a4ca224514ba928d761a7356cecd demonstrates `iterator()` working lazily - the filtering happens as the data is iterated over, not all in one go. It really does sound like it's the stream itself that's evaluating too eagerly. A [mcve] would really help... – Jon Skeet Nov 01 '17 at 19:16
  • 1
    @JeroenBollen please explain how you are getting the stream in the first place – fps Nov 01 '17 at 19:22
  • @FedericoPeraltaSchaffner I am using `Files.lines()` and applying some filters to it. – Jeroen Nov 01 '17 at 19:54
  • It turns out the increased memory usage was my error. After profiling I found out that the increase actually comes from my test-code internally not using a set but an `ArrayList`. This made the comparison with my code using `LambdaException` as opposed to `Stream.iterator` very unfair. After correcting this mistake, memory usage is indeed no more excessive than is to be expected from running a Java application. @JonSkeet It seems as such `Stream.iterator()` does work, feel free to submit an answer! – Jeroen Nov 01 '17 at 19:57
  • 3
    @FedericoPeraltaSchaffner: actually, it *does* fill a data structure, but that data structure has the size *one* in most cases. There’s no way around it due to the way, `Iterator` has been designed, i.e. `hasNext()` has to poll an element that will be returned later in `next()`, so it has to be stored somewhere in-between these two method calls. – Holger Nov 02 '17 at 09:01
  • @Holger Hi! I think you mean a holding consumer, or a one element array, don't remember now, but yes, of course you are correct. – fps Nov 02 '17 at 13:47

1 Answers1

11

To iterate over a stream element-by-element, just call the iterator() method:

Iterator<String> iterator = stream.iterator();
while (iterator.hasNext()) {
    String element = iterator.next();
    // Use element
}

It's not clear to what extent that helps you in terms of checked exceptions, and it's worth noting that it's a terminal operation - once you've used the iterator, you'd need to get a new stream if you want to iterate again - but it does answer the question of how to read a stream one element at a time.

Jon Skeet
  • 1,421,763
  • 867
  • 9,128
  • 9,194
  • The checked exceptions was only context to the question, but it helps because checked exceptions can be thrown like this, while they cannot when iterating over a stream with `Stream.forEach`. – Jeroen Nov 01 '17 at 20:03