25

For educational purposes I want to create a stream of prime numbers using Java-8. Here's my approach. The number x is prime if it has no prime divisors not exceeding sqrt(x). So assuming I already have a stream of primes I can check this with the following predicate:

x -> Seq.seq(primes()).limitWhile(p -> p <= Math.sqrt(x)).allMatch(p -> x % p != 0)

Here I used jOOλ library (0.9.10 if it matters) just for limitWhile operation which is absent in standard Stream API. So now knowing some previous prime prev I can generate the next prime iterating the numbers until I find the one matching this predicate:

prev -> LongStream.iterate(prev + 1, i -> i + 1)
                  .filter(x -> Seq.seq(primes()).limitWhile(p -> p <= Math.sqrt(x))
                                                .allMatch(p -> x % p != 0))
                  .findFirst()
                  .getAsLong()

Putting everything together I wrote the following primes() method:

public static LongStream primes() {
    return LongStream.iterate(2L, 
            prev -> LongStream.iterate(prev + 1, i -> i + 1)
                              .filter(x -> Seq.seq(primes())
                                              .limitWhile(p -> p <= Math.sqrt(x))
                                              .allMatch(p -> x % p != 0))
                              .findFirst()
                              .getAsLong());
}

Now to launch this I use:

primes().forEach(System.out::println);

Unfortunately it fails with unpleasant StackOverflowError which looks like this:

Exception in thread "main" java.lang.StackOverflowError
at java.util.stream.ReferencePipeline$StatelessOp.opIsStateful(ReferencePipeline.java:624)
at java.util.stream.AbstractPipeline.<init>(AbstractPipeline.java:211)
at java.util.stream.ReferencePipeline.<init>(ReferencePipeline.java:94)
at java.util.stream.ReferencePipeline$StatelessOp.<init>(ReferencePipeline.java:618)
at java.util.stream.LongPipeline$3.<init>(LongPipeline.java:225)
at java.util.stream.LongPipeline.mapToObj(LongPipeline.java:224)
at java.util.stream.LongPipeline.boxed(LongPipeline.java:201)
at org.jooq.lambda.Seq.seq(Seq.java:2481)
at Primes.lambda$2(Primes.java:13)
at Primes$$Lambda$4/1555009629.test(Unknown Source)
at java.util.stream.LongPipeline$8$1.accept(LongPipeline.java:324)
at java.util.Spliterators$LongIteratorSpliterator.tryAdvance(Spliterators.java:2009)
at java.util.stream.LongPipeline.forEachWithCancel(LongPipeline.java:160)
at java.util.stream.AbstractPipeline.copyIntoWithCancel(AbstractPipeline.java:529)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:516)
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:502)
at java.util.stream.FindOps$FindOp.evaluateSequential(FindOps.java:152)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.util.stream.LongPipeline.findFirst(LongPipeline.java:474)
at Primes.lambda$0(Primes.java:14)
at Primes$$Lambda$1/918221580.applyAsLong(Unknown Source)
at java.util.stream.LongStream$1.nextLong(LongStream.java:747)
at java.util.Spliterators$LongIteratorSpliterator.tryAdvance(Spliterators.java:2009)
...

You might think that I deserve what I get: I called the primes() recursively inside the primes() method itself. However let's just change the method return type to Stream<Long> and use Stream.iterate instead, leaving everything else as is:

public static Stream<Long> primes() {
    return Stream.iterate(2L, 
            prev -> LongStream.iterate(prev + 1, i -> i + 1)
                              .filter(x -> Seq.seq(primes())
                                              .limitWhile(p -> p <= Math.sqrt(x))
                                              .allMatch(p -> x % p != 0))
                              .findFirst()
                              .getAsLong());
}

Now it works like a charm! Not very fast, but in couple of minutes I get the prime numbers exceeding 1000000 without any exceptions. The result is correct, which can be checked against the table of primes:

System.out.println(primes().skip(9999).findFirst());
// prints Optional[104729] which is actually 10000th prime.

So the question is: what's wrong with the first LongStream-based version? Is it jOOλ bug, JDK bug or I'm doing something wrong?

Note that I'm not interested in alternative ways to generate primes, I want to know what's wrong with this specific code.

Tagir Valeev
  • 97,161
  • 19
  • 222
  • 334
  • 4
    It's not JOOL. Replacing the Seq-based filter with equivalent `x -> primes().filter(p -> p * p > x || x % p == 0).findFirst().get() > Math.sqrt(x)` has the same behavior. Works for `Stream` but fails for `LongStream`. – Misha Mar 13 '16 at 09:27

2 Answers2

18

It seems that LongStream and Stream behave differently when streams are produced by iterate. The following code illustrates the distinction:

LongStream.iterate(1, i -> {
    System.out.println("LongStream incrementing " + i);
    return i + 1;
}).limit(1).count();

Stream.iterate(1L, i -> {
    System.out.println("Stream incrementing " + i);
    return i + 1;
}).limit(1).count();

The output is

LongStream incrementing 1

So LongStream will call the function even if only the first element is needed while Stream will not. This explains the exception you are getting.

I don't know if this should be called a bug. Javadoc doesn't specify this behavior one way or another although it would be nice if it were consistent.

One way to fix it is to hardcode the initial sequence of primes:

public static LongStream primes() {
    return LongStream.iterate(2L,
        prev -> prev == 2 ? 3 : 
                prev == 3 ? 5 :
                LongStream.iterate(prev + 1, i -> i + 1)
                        .filter(x -> Seq.seq(primes())
                            .limitWhile(p -> p <= Math.sqrt(x))
                            .allMatch(p -> x % p != 0)
                        ).findFirst()
                        .getAsLong());
Misha
  • 27,433
  • 6
  • 62
  • 78
15

You can produce this difference in much simpler ways. Consider the following two version of (equally inefficient) recursive long enumeration streams, which can be called as follows to produce a sequence from 1-5:

longs().limit(5).forEach(System.out::println);

Will cause the same StackOverflowError

public static LongStream longs() {
    return LongStream.iterate(1L, i ->
        1L + longs().skip(i - 1L)
                    .findFirst()
                    .getAsLong());
}

Will work

public static Stream<Long> longs() {
    return Stream.iterate(1L, i ->
        1L + longs().skip(i - 1L)
                    .findFirst()
                    .get());
}

The reason

The boxed Stream.iterate() implementation is optimised as follows:

    final Iterator<T> iterator = new Iterator<T>() {
        @SuppressWarnings("unchecked")
        T t = (T) Streams.NONE;

        @Override
        public boolean hasNext() {
            return true;
        }

        @Override
        public T next() {
            return t = (t == Streams.NONE) ? seed : f.apply(t);
        }
    };

unlike the LongStream.iterate() version:

    final PrimitiveIterator.OfLong iterator = new PrimitiveIterator.OfLong() {
        long t = seed;

        @Override
        public boolean hasNext() {
            return true;
        }

        @Override
        public long nextLong() {
            long v = t;
            t = f.applyAsLong(t);
            return v;
        }
    };

Notice how the boxed iterator calls the function only after the seed has been returned, whereas the primitive iterator caches the next value prior to returning the seed.

This means that when you use a recursive iteration function with the primitive iterator, the first value in the stream can never be produced, because the next value is fetched prematurely.

This can probably be reported as a JDK bug, and also explains Misha's observation

Community
  • 1
  • 1
Lukas Eder
  • 211,314
  • 129
  • 689
  • 1,509