7

Here's a snippet of code that gives me a StackOverflowError (boiled down from an actual example in my codebase):

( ->> (range 3000)
      (mapcat #(concat [0] (take 100 (repeat %))))
      (reduce (constantly nil))
      (count))

(Note: this code isn't designed to do anything other than demonstrate the problem or return zero.)

I can "rescue" it by any of the following steps:

  1. Remove the reduce line
  2. Change [0] to '(0)
  3. Add a (take 100000000) (or any integer) at any point between mapcat and count.

I'm basically baffled by this behavior (especially #2). I'd appreciate any input.

(I have a feeling this may be related to Why does reduce give a StackOverflowError in Clojure?, but I can't quite tell how -- so if it is related, I'd appreciate an explanation of why.)

Community
  • 1
  • 1
Bosh
  • 8,138
  • 11
  • 51
  • 77
  • Is it suppose to return 0? Interestingly I don't get a stack overflow if I change it to this: ( ->> (range) (mapcat (fn [p] (concat [0] (take 100 (repeat nil))))) (take 3000) (reduce (constantly nil)) (count)) – Hugo Oct 16 '14 at 05:42
  • Yes, it's "supposed to" return 0. (I mean, the code isn't really supposed to do anything other than demonstrate this behavior in the shortest way I could find.) The `count` is just in there to demonstrate that consuming the sequence itself doesn't cause an error. In your example, @firthh, you can change (take 3000) to (take 3000000) if you like, and it still works. You can also put the (take 3000) after the reduce step... – Bosh Oct 16 '14 at 05:51
  • reduce is strict, and uses constant stack space - it is more likely the concat / mapcat calls stacking up unevaluated before being forced that are causing the issue – noisesmith Oct 16 '14 at 05:55
  • @noisesmith that's not really an explanation, though. Why does `count` succeed when I remove the `reduce`? And why does '(0) act differently? And why doesn't wrapping the concat (or mapcat) result in (doall) help? – Bosh Oct 16 '14 at 05:58

2 Answers2

10

Under normal circumstances, reduce operates using a loop/recur construct and uses constant stack space. However, you've hit a nasty corner case caused by reducing over a sequence produced by providing concat with alternating chunked and non-chunked sequences (the vector [0] is chunked; the seq produced by (take 100 (repeat %)) is non-chunked).

When the first argument to concat is a chunked sequence, then it will return a lazy sequence that will use chunk-cons to produce another chunked sequence. Otherwise, it will use cons to produce a non-chunked sequence.

Meanwhile, the implementation of reduce uses the InternalReduce protocol (defined in clojure.core.protocols) which provides an internal-reduce function for structures that can reduce themselves more efficiently than with the default first/next recursion. The internal-reduce implementation for chunked sequences uses chunk functions to consume the chunked items in a loop up until it's left with a non-chunked sequence, then calls internal-reduce on the remainder. The default internal-reduce implementation similarly uses first/next to consume items in a loop until the underlying seq type changes, then calls internal-reduce on the new seq type to dispatch to the appropriate optimized version. As you progress through the seq that concat produced, alternating between chunked and non-chunked sub-sequences, the internal-reduce calls pile up on the stack and eventually blow it.

A simpler illustration of this case is:

;; All chunked sub-seqs is OK
user> (reduce + (apply concat (take 10000 (repeat [1]))))
10000

;; All non-chunked sub-seqs is OK
user> (reduce + (apply concat (take 10000 (repeat '(1)))))
10000

;; Interleaved chunked and non-chunked sub-seqs blows the stack
user> (reduce + (apply concat (take 10000 (interleave (repeat [1]) (repeat '(1))))))
StackOverflowError   clojure.lang.LazySeq.seq (LazySeq.java:60)

Examining the stack trace:

StackOverflowError 
    clojure.core/seq (core.clj:133)
    clojure.core/interleave/fn--4525 (core.clj:3901)
    clojure.lang.LazySeq.sval (LazySeq.java:42)
    clojure.lang.LazySeq.seq (LazySeq.java:60)
    clojure.lang.RT.seq (RT.java:484)
    clojure.core/seq (core.clj:133)
    clojure.core/take/fn--4232 (core.clj:2554)
    clojure.lang.LazySeq.sval (LazySeq.java:42)
    clojure.lang.LazySeq.seq (LazySeq.java:60)
    clojure.lang.Cons.next (Cons.java:39)
    clojure.lang.RT.next (RT.java:598)
    clojure.core/next (core.clj:64)
    clojure.core/concat/cat--3925/fn--3926 (core.clj:694)
    clojure.lang.LazySeq.sval (LazySeq.java:42)
    clojure.lang.LazySeq.seq (LazySeq.java:60)
    clojure.lang.ChunkedCons.chunkedNext (ChunkedCons.java:59)
    clojure.core/chunk-next (core.clj:660)
    clojure.core.protocols/fn--6041 (protocols.clj:101)
    clojure.core.protocols/fn--6005/G--6000--6014 (protocols.clj:19)
    clojure.core.protocols/fn--6034 (protocols.clj:147)
    clojure.core.protocols/fn--6005/G--6000--6014 (protocols.clj:19)
    clojure.core.protocols/fn--6041 (protocols.clj:104)
    clojure.core.protocols/fn--6005/G--6000--6014 (protocols.clj:19)
    clojure.core.protocols/fn--6034 (protocols.clj:147)
    clojure.core.protocols/fn--6005/G--6000--6014 (protocols.clj:19)
    clojure.core.protocols/fn--6041 (protocols.clj:104)
    clojure.core.protocols/fn--6005/G--6000--6014 (protocols.clj:19)
    clojure.core.protocols/fn--6034 (protocols.clj:147)
    clojure.core.protocols/fn--6005/G--6000--6014 (protocols.clj:19)
    clojure.core.protocols/fn--6041 (protocols.clj:104)

As for your workarounds:

  • Clearly, avoiding reduce prevents the problem altogether.
  • Changing [0] to '(0) replaces the chunked sub-sequences with non-chunked ones, circumventing the optimization for chunked seqs in internal-reduce and allowing the reduction to happen in a single loop with constant stack space.
  • Inserting take creates a new, non-chunked sequence, composed entirely of cons cells.
Alex
  • 13,811
  • 1
  • 37
  • 50
0

I think the problem is with mapcat, which calls concat, which uses cons. cons on vectors is expensive (and probably consumes stack), while it's cheap for lists. That's why changing from vector to list "fixes" the problem.

ivant
  • 3,909
  • 1
  • 25
  • 39
  • Can you clarify why the failure only happens in the `reduce` step? If I remove `reduce`, why does `count` still work (which consumes the whole sequenece with no trouble) – Bosh Oct 16 '14 at 14:58