13

Consider the following code:

urls.stream()
    .flatMap(url -> fetchDataFromInternet(url).stream())
    .filter(...)
    .findFirst()
    .get();

Will fetchDataFromInternet be called for second url when the first one was enough?

I tried with a smaller example and it looks like working as expected. i.e processes data one by one but can this behavior be relied on? If not, does calling .sequential() before .flatMap(...) help?

    Stream.of("one", "two", "three")
            .flatMap(num -> {
                System.out.println("Processing " + num);
                // return FetchFromInternetForNum(num).data().stream();
                return Stream.of(num);
            })
            .peek(num -> System.out.println("Peek before filter: "+ num))
            .filter(num -> num.length() > 0)
            .peek(num -> System.out.println("Peek after filter: "+ num))
            .forEach(num -> {
                System.out.println("Done " + num);
            });

Output:

Processing one
Peek before filter: one
Peek after filter: one
Done one
Processing two
Peek before filter: two
Peek after filter: two
Done two
Processing three
Peek before filter: three
Peek after filter: three
Done three

Update: Using official Oracle JDK8 if that matters on implementation

Answer: Based on the comments and the answers below, flatmap is partially lazy. i.e reads the first stream fully and only when required, it goes for next. Reading a stream is eager but reading multiple streams is lazy.

If this behavior is intended, the API should let the function return an Iterable instead of a stream.

In other words: link

balki
  • 26,394
  • 30
  • 105
  • 151
  • 2
    The doc on [parallelism](https://docs.oracle.com/javase/tutorial/collections/streams/parallelism.html) says "When you create a stream, it is always a serial stream unless otherwise specified.", so a call to `.sequential()` isn't necessary. – teppic Sep 18 '17 at 22:33
  • What makes you think it isn't? – pedromss Sep 18 '17 at 23:40
  • @pedromss Documentation doesn't say it explicitly. https://docs.oracle.com/javase/8/docs/api/java/util/stream/Stream.html#flatMap-java.util.function.Function- And looks like there are few cases where it may not be lazy: https://stackoverflow.com/questions/29229373/why-filter-after-flatmap-is-not-completely-lazy-in-java-streams – balki Sep 18 '17 at 23:51
  • @balki the SO post you linked states in the accepted answer that intermediate operations are always lazy. Also, from the [documentation](https://docs.oracle.com/javase/8/docs/api/java/util/stream/Stream.html): "Streams are lazy; computation on the source data is only performed when the terminal operation is initiated, and source elements are consumed only as needed." Flatmap is an intermediate operation – pedromss Sep 18 '17 at 23:54
  • Thanks! It was not very obvious for me. I will accept if you add an answer. – balki Sep 19 '17 at 00:02
  • 3
    `fetchDataFromInternet` will not be called more than necessary, but the elements returned by a particular `fetchDataFromInternet` invocation might get processed without laziness. – Holger Sep 19 '17 at 07:17

3 Answers3

15

Under the current implementation, flatmap is eager; like any other stateful intermediate operation (like sorted and distinct). And it's very easy to prove :

 int result = Stream.of(1)
            .flatMap(x -> Stream.generate(() -> ThreadLocalRandom.current().nextInt()))
            .findFirst()
            .get();

    System.out.println(result);

This never finishes as flatMap is computed eagerly. For your example:

urls.stream()
    .flatMap(url -> fetchDataFromInternet(url).stream())
    .filter(...)
    .findFirst()
    .get();

It means that for each url, the flatMap will block all others operation that come after it, even if you care about a single one. So let's suppose that from a single url your fetchDataFromInternet(url) generates 10_000 lines, well your findFirst will have to wait for all 10_000 to be computed, even if you care about only one.

EDIT

This is fixed in Java 10, where we get our laziness back: see JDK-8075939

EDIT 2

This is fixed in Java 8 too (8u222): JDK-8225328

ZhekaKozlov
  • 36,558
  • 20
  • 126
  • 155
Eugene
  • 117,005
  • 15
  • 201
  • 306
5

It’s not clear why you set up an example that does not address the actual question, you’re interested in. If you want to know, whether the processing is lazy when applying a short-circuiting operation like findFirst(), well, then use an example using findFirst() instead of forEach that processes all elements anyway. Also, put the logging statement right into the function whose evaluation you want to track:

Stream.of("hello", "world")
      .flatMap(s -> {
          System.out.println("flatMap function evaluated for \""+s+'"');
          return s.chars().boxed();
      })
      .peek(c -> System.out.printf("processing element %c%n", c))
      .filter(c -> c>'h')
      .findFirst()
      .ifPresent(c -> System.out.printf("found an %c%n", c));
flatMap function evaluated for "hello"
processing element h
processing element e
processing element l
processing element l
processing element o
found an l

This demonstrates that the function passed to flatMap gets evaluated lazily as expected while the elements of the returned (sub-)stream are not evaluated as lazy as possible, as already discussed in the Q&A you have linked yourself.

So, regarding your fetchDataFromInternet method that gets invoked from the function passed to flatMap, you will get the desired laziness. But not for the data it returns.

Holger
  • 285,553
  • 42
  • 434
  • 765
1

Today I also stumbled up on this bug. Behavior is not so strait forward, cause simple case, like below, is working fine, but similar production code doesn't work.

 stream(spliterator).map(o -> o).flatMap(Stream::of)..flatMap(Stream::of).findAny()

For guys who cannot wait another couple years for migration to JDK-10 there is a alternative true lazy stream. It doesn't support parallel. It was dedicated for JavaScript translation, but it worked out for me, cause interface is the same.

StreamHelper is collection based, but it is easy to adapt Spliterator.

https://github.com/yaitskov/j4ts/blob/stream/src/main/java/javaemul/internal/stream/StreamHelper.java

Daniil Iaitskov
  • 5,525
  • 8
  • 39
  • 49