2

When stress-testing some Clojure code at work, I noticed it runs out of heap space when iterating over large data-sets. I eventually managed to trace the issues back to the combination of Clojure's doseq function, and implementation fo lazy sequences.

This is the minimal code snippet that crashes Clojure by exhausting available heap space:

(doseq [e (take 1000000000 (iterate inc 1))] (identity e))

The documentation for doseq clearly states that it doesn't retain the head of the lazy sequence, so I would expect the memory complexity of the above code to be close to O(1). Is there something I'm missing? What's the Clojure-idiomatic way of iterating over extremely large lazy sequences, if doseq isn't up to the job?

  • 7
    Which clojure version and java runtime are you using? Running that code in a clean repl with clojure 1.4 / java 1.6.0_33 on OSX shows completely static memory use at just under 400 Mb – Joost Diepenmaat Aug 13 '12 at 11:42
  • 1
    Are you sure that's the exact snippet which is causing the problem? Runs fine on my machine (Clojure 1.4, JDK7, Windows, Eclipse/CCW). It *would* be a problem if you held onto the head of the sequence somehow, e.g. if (iterate inc 1) was stored somewhere else. – mikera Aug 14 '12 at 02:13
  • You're both right, the problem goes away when I turn off the things Leiningen 2 adds to the REPL (history, key navigation, etc.) and just use the vanilla Clojure 1.4. Thanks. – the80srobot Aug 14 '12 at 08:07

1 Answers1

2

When I run this sample I see the memory usage hit 2.0 Gigs so perhaps you are actually just running out of ram.

it sure does take a while to run:

user=> (time (doseq [e (take 1000000000 (iterate inc 1))] (identity e)))
"Elapsed time: 266396.221132 msecs"

form top:

23999 arthur    20   0 4001m 1.2g 5932 S  213 15.3  17:11.35 java                                          
24017 arthur    20   0 3721m 740m 5548 S   88  9.3  13:49.95 java  
Arthur Ulfeldt
  • 90,827
  • 27
  • 201
  • 284