0

I get an OOM in Java 19 even with 8GB if I do this:

IntStream
.iterate(0, i -> i + 1)
.skip(2)
.limit(10_000_000)
.filter(i -> checkSum(i) <= 20)
.parallel()
.count();

However, I don't get any OOM if I omit skip(2):

IntStream
.iterate(0, i -> i + 1)
//.skip(2)
.limit(10_000_000)
.filter(i -> checkSum(i) <= 20)
.parallel()
.count();

where checksum(...) is

public static long checkSum(long n) {
    long result = 0;
    long remaining = n;
    while (0 < remaining) {
        long remainder = remaining % 10;
        result += remainder;
        remaining = (remaining - remainder) / 10;
    }
    return result;
}

Why does skip() in this parallel stream expression in Java 19 cause an OOM even with 8GB?

I know I should use range(...) instead of iterate()+limit() with or without skip(). However, that doesn't answer me this question. I would like to understand what's the issue here.

mmirwaldt
  • 843
  • 7
  • 17
  • Order of operations is important. Move the `skip(2)` down a line, and it will work. – Turing85 Jan 05 '23 at 00:57
  • If you do not want to use `.range(...)` (because maybe the number you want are irregular), you can `limit` already on the `iterate`: `....iterate(0, i -> i < 10_000_002, i -> i + 1)...` – Turing85 Jan 05 '23 at 01:04
  • I know the order is important. And yes, it works if skip() comes AFTER limit() even if it changes the result a bit here. However, I don't understand why skip() before limit() leads to a OOM. I would only have expected it hanging up. – mmirwaldt Jan 05 '23 at 01:05
  • Hmm... the behaviour seems only partially related to `skip(...)`. It is also influenced by `parallel()` (removing `....paralllel()...` also prevents the `Exception`).... – Turing85 Jan 05 '23 at 01:16
  • this seems pretty closely related: [Java 8, using .parallel in a stream causes OOM error](https://stackoverflow.com/questions/30825708/java-8-using-parallel-in-a-stream-causes-oom-error) – Turing85 Jan 05 '23 at 01:22

1 Answers1

2

skip() - is a stateful operation which guarantees that n first elements of the stream (with respect of the encounter order, if the stream is ordered) would be omitted.

It would be cheap in a sequential pipeline, but might be costful while running in parallel if the stream is ordered. Documentation warns about that and suggests loosening the constraint ordering if possible.

API Note:

While skip() is generally a cheap operation on sequential stream pipelines, it can be quite expensive on ordered parallel pipelines, especially for large values of n, since skip(n) is constrained to skip not just any n elements, but the first n elements in the encounter order. Using an unordered stream source (such as generate(Supplier)) or removing the ordering constraint with BaseStream.unordered() may result in significant speedups of skip() in parallel pipelines, if the semantics of your situation permit. If consistency with encounter order is required, and you are experiencing poor performance or memory utilization with skip() in parallel pipelines, switching to sequential execution with BaseStream.sequential() may improve performance.

Emphasis added

The following unordered Stream would run without issues (result of the stream execution might not be consistent because in unordered stream threads a free to discard any n elements, not n first):

IntStream
    .iterate(0, i -> i + 1)
    .unordered()
    .skip(2)
    .limit(10_000_000)
    .filter(i -> checkSum(i) <= 20)
    .parallel()
    .count();
Alexander Ivanchenko
  • 25,667
  • 5
  • 22
  • 46