1

I want to compute a lazy sequence of primes.

Here is the interface:

user=> (take 10 primes)
(2 3 5 7 11 13 17 19 23 29)

So far, so good.

However, when I take 500 primes, this results in a stack overflow.

                  core.clj:  133  clojure.core/seq
                  core.clj: 2595  clojure.core/filter/fn
              LazySeq.java:   40  clojure.lang.LazySeq/sval
              LazySeq.java:   49  clojure.lang.LazySeq/seq
                   RT.java:  484  clojure.lang.RT/seq
                  core.clj:  133  clojure.core/seq
                  core.clj: 2626  clojure.core/take/fn
              LazySeq.java:   40  clojure.lang.LazySeq/sval
              LazySeq.java:   49  clojure.lang.LazySeq/seq
                 Cons.java:   39  clojure.lang.Cons/next
              LazySeq.java:   81  clojure.lang.LazySeq/next
                   RT.java:  598  clojure.lang.RT/next
                  core.clj:   64  clojure.core/next
                  core.clj: 2856  clojure.core/dorun
                  core.clj: 2871  clojure.core/doall
                  core.clj: 2910  clojure.core/partition/fn
              LazySeq.java:   40  clojure.lang.LazySeq/sval
              LazySeq.java:   49  clojure.lang.LazySeq/seq
                   RT.java:  484  clojure.lang.RT/seq
                  core.clj:  133  clojure.core/seq
                  core.clj: 2551  clojure.core/map/fn
              LazySeq.java:   40  clojure.lang.LazySeq/sval
              LazySeq.java:   49  clojure.lang.LazySeq/seq
                   RT.java:  484  clojure.lang.RT/seq
                  core.clj:  133  clojure.core/seq
                  core.clj: 3973  clojure.core/interleave/fn
              LazySeq.java:   40  clojure.lang.LazySeq/sval

I'm wondering what is the problem here and, more generally, when working with lazy sequences, how should I approach this class of error?

Here is the code.

(defn assoc-nth
  "Returns a lazy seq of coll, replacing every nth element by val

  Ex:
  user=> (assoc-nth [3 4 5 6 7 8 9 10] 2 nil)
  (3 nil 5 nil 7 nil 9 nil)
  "
  [coll n val]
  (apply concat
         (interleave
          (map #(take (dec n) %) (partition n coll)) (repeat [val]))))

(defn sieve
  "Returns a lazy seq of primes by Eratosthenes' method

  Ex:
  user=> (take 4 (sieve (iterate inc 2)))
  (2 3 5 7)

  user=> (take 10 (sieve (iterate inc 2)))
  (2 3 5 7 11 13 17 19 23 29)
  "
  [s]
  (lazy-seq
   (if (seq s)
     (cons (first s) (sieve
                      (drop-while nil? (assoc-nth (rest s) (first s) nil))))
     [])))

(def primes
  "Returns a lazy seq of primes

  Ex:
  user=> (take 10 primes)
  (2 3 5 7 11 13 17 19 23 29)
  "
  (concat [2] (sieve (filter odd? (iterate inc 3)))))
Charles R
  • 17,989
  • 6
  • 20
  • 18
  • 2
    Your implementation of the Sieve of Eratosthenes is not tail-recursive (the `recur` function is never used) so this is going to eventually blow the stack. Clojure != Lisp, so if you copied an implementation from another version of Lisp you'll need to rework it to make it tail-recursive in Clojure (and honestly I don't see how it would have been tail recursive in another Lisp either).. At first glance it appears that `sieve` will need to be re-done to make it tail-recursive. Best of luck. – Bob Jarvis - Слава Україні Sep 24 '15 at 03:07
  • 1
    lazy-seq can eliminate this concern through thunk indirection, so there is some other issue here. – noisesmith Sep 24 '15 at 03:35
  • I was unable to reproduce the stack overflow, and your code seems fine. When I increase the limit from 500 to 1000, the program gets too slow for me to be patient enough, and the heap usage is enormous. – Davyzhu Sep 24 '15 at 04:07
  • Each time you construct next prime, you construct it using lazy-sequence builders, like concat, on lazy seqs. So each time next prime is build from this lazy calls adding more lazy layers above. And when you evaluate the the nth element of primes, you have the large stack of lazy-seqs that call another lazy-seq e.t.c. – JustAnotherCurious Sep 24 '15 at 10:15

1 Answers1

4

You are generating a lot of lazy sequences that all take up stack space.

Breaking it down:

user=> (iterate inc 3)
;; produces first lazy seq (don't run in repl!)
(3 4 5 6 7 8 9 10 11 12 13 ...)

user=> (filter odd? (iterate inc 3))
;; again, don't run in repl, but lazy seq #2
(3 5 7 9 11 13 ...)

I personally would have done (iterate #(+ 2 %) 3) to cut a sequence out, but that's a drop in the ocean compared to the overall problem.

Now we start into the sieve which starts creating a new lazy seq for our output

(lazy-seq
 (cons 3 (sieve (drop-while nil? (assoc-nth '(5 7 9 ...) 3 nil)))))

Jumping into assoc-nth, we start creating more lazy sequences

user=> (partition 3 [5 7 9 11 13 15 17 19 21 23 25 ...])
;; lazy seq #4
((5 7 9) (11 13 15) (17 19 21) (23 25 27) ...)

user=> (map #(take 2 %) '((5 7 9) (11 13 15) (17 19 21) (23 25 27) ...))
;; lazy seq #5
((5 7) (11 13) (17 19) (23 25) ...)

user=> (interleave '((3 5) (9 11) (15 17) (21 23) ...) (repeat [nil]))
;; lazy seq #6 + #7 (repeat [nil]) is a lazy seq too
((5 7) [nil] (11 13) [nil] (17 19) [nil] (23 25) [nil] ...)

user=> (apply concat ...
;; lazy seq #8
(5 7 nil 11 13 nil 17 19 nil 23 25 nil ...)

So already you have 8 lazy sequences, and we've only applied first sieve.

Back to sieve, and it does a drop-while which produces lazy sequence number 9

user=> (drop-while nil? '(5 7 nil 11 13 nil 17 19 nil 23 25 nil ...))
;; lazy seq #9
(5 7 11 13 17 19 23 25 ...)

So for one iteration of sieve, we generated 9 sequences.

In the next iteration of sieve, you generate a new lazy-seq (not the original!), and the process begins again, generating another 7 sequences per loop.

The next problem is the efficiency of your assoc-nth function. A sequence of numbers and nils isn't an efficient way to mark multiples of a particular factor. You end up very quickly with a lot of very long sequences that can't be released, mostly filled with nils, and you need to read longer and longer sequences to decide if a candidate is a factor or not - this isn't efficient for lazy sequences which read chunks of 32 entries to be efficient, so when your factoring out 33 and above, you're pulling in multiple chunks of the sequence to work on.

Also, each time you're doing this in a completely new sequence.

Adding some debug to your assoc-nth, and running it on a small sample will illustrate this quickly.

user=> (sieve (take 50 (iterate #(+ 2 %) 3)))
assoc-nth n: 3 , coll: (5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 65 67 69 71 73 75 77 79 81 83 85 87 89 91 93 95 97 99 101)
returning r: (5 7 nil 11 13 nil 17 19 nil 23 25 nil 29 31 nil 35 37 nil 41 43 nil 47 49 nil 53 55 nil 59 61 nil 65 67 nil 71 73 nil 77 79 nil 83 85 nil 89 91 nil 95 97 nil)
assoc-nth n: 5 , coll: (7 nil 11 13 nil 17 19 nil 23 25 nil 29 31 nil 35 37 nil 41 43 nil 47 49 nil 53 55 nil 59 61 nil 65 67 nil 71 73 nil 77 79 nil 83 85 nil 89 91 nil 95 97 nil)
returning r: (7 nil 11 13 nil 17 19 nil 23 nil nil 29 31 nil nil 37 nil 41 43 nil 47 49 nil 53 nil nil 59 61 nil nil 67 nil 71 73 nil 77 79 nil 83 nil nil 89 91 nil nil)
assoc-nth n: 7 , coll: (nil 11 13 nil 17 19 nil 23 nil nil 29 31 nil nil 37 nil 41 43 nil 47 49 nil 53 nil nil 59 61 nil nil 67 nil 71 73 nil 77 79 nil 83 nil nil 89 91 nil nil)
returning r: (nil 11 13 nil 17 19 nil 23 nil nil 29 31 nil nil 37 nil 41 43 nil 47 nil nil 53 nil nil 59 61 nil nil 67 nil 71 73 nil nil 79 nil 83 nil nil 89 nil)
;; ...
assoc-nth n: 17 , coll: (19 nil 23 nil nil 29 31 nil nil 37 nil 41 43 nil 47 nil nil 53 nil nil 59 61 nil nil)
returning r: (19 nil 23 nil nil 29 31 nil nil 37 nil 41 43 nil 47 nil nil)
assoc-nth n: 19 , coll: (nil 23 nil nil 29 31 nil nil 37 nil 41 43 nil 47 nil nil)
returning r: ()
(3 5 7 11 13 17 19)

This illustrates how you need more and more elements of the sequence to produce a list of primes, as I started with odd numbers 3 to 99, but only ended up with primes 3 to 19, but on the last iteration n=19, there weren't enough elements in my finite sequence to nil out further multiples.

Is there a solution?

I think you're looking for answers to "how do I make lazy sequences better at doing what I want to do?". Lazy sequences are going to be a trade off in this case. Your algorithm works, but generates too much stack. You're not using any recur, so you generate sequences on sequences. The first thing to look at would be how to make your methods more tail recursive, and lose some of the sequences. I can't offer solutions to that here, but I can link other solutions to the same problem and see if there are areas they do better than yours.

There are 2 decent (bound numbers) solutions at this implementation of Eratosthenes primes. One is using a range, and sieving it, the other using arrays of booleans (which is 40x quicker). The associated article with it (in Japenese but google translates well in Chrome) is a good read, and shows the times for naive approaches vs a very focused version using arrays directly and type hints to avoid further unboxing issues in the jvm.

There's another SO question that has a good implementation using transient values to improve efficiency. It has similar filtering methods to yours.

In each case, different sieving methods are used that avoid lazy sequences, however as there's an upper bound on the primes they generate, they can swap out generality for efficiency.

For unbound case, look at this gist for an infinite sequence of primes. In my tests it was 6x slower than the previous SO question, but still pretty good. The implementation however is very different.

Community
  • 1
  • 1
Mark Fisher
  • 9,838
  • 3
  • 32
  • 38