Clojure lazy sequences in math.combinatorics results in OutOfMemory (OOM) Error

Question

The documentation of math.combinatorics states that all functions return lazy sequences.

However if I try to run subsets with a lot of data,

(last (combinatorics/subsets (range 20)))
;OutOfMemoryError Java heap space  clojure.lang.RT.cons (RT.java:559)

I get an OutOfMemory Error.

Running

(last (range))

burns CPU, but it doesn't return an error.

Clojure doesn't seem to "hold on the head" like explained in this Stack Overflow question.

Why is this happening and how I can use bigger ranges in subsets?

Update

It seems to work on some peoples computers as the comments suggest. So I will post my system configuration

I run a Mac (10.8.3) and installed Clojure (1.5.1) with Homebrew.

My Java version is:

% java -version
java version "1.6.0_45"
Java(TM) SE Runtime Environment (build 1.6.0_45-b06-451-11M4406)
Java HotSpot(TM) 64-Bit Server VM (build 20.45-b01-451, mixed mode)

I didn't change any of the default settings. I also reinstalled all dependencies, by deleting the ~/.m2 folder.

My projects.clj.

And the command I used was this

% lein repl
nREPL server started on port 61774
REPL-y 0.1.10
Clojure 1.5.1
=> (require 'clojure.math.combinatorics)
nil
=> (last (clojure.math.combinatorics/subsets (range 20)))
OutOfMemoryError Java heap space  clojure.lang.RT.cons (RT.java:570)
or
OutOfMemoryError Java heap space  clojure.math.combinatorics/index-combinations/fn--1148/step--1164 (combinatorics.clj:64)

I tested the problem on a colleague's laptop, and he had the same issue, but he was on a Mac, too.

Works fine without OOM error.. which clojure version you are using? — Ankur, Apr 24 '13 at 15:19
Works for me using Clojure 1.5.1 on Linux with default JVM settings. — user100464, Apr 24 '13 at 16:14
clojure 1.5.1 on a mac (installed with homebrew). I updated the problem description with more details. — , Apr 27 '13 at 12:12
@Ankur user100464 What happens when you run (last (clojure.math.combinatorics/subsets (range 1000)))? What -Xmx settings do you have? — , Apr 29 '13 at 09:30

DanLebrero · Answer 1 · 2013-04-29T21:38:56.150

3

The issue is that subsets uses mapcat, and mapcat is not lazy enough as it uses apply which realizes and holds some of the elements to be concatenated. See a very nice explanation here. Using the lazier mapcat version of that link in subsets should fix the issue:

(defn my-mapcat
   [f coll]
   (lazy-seq
     (if (not-empty coll)
      (concat
      (f (first coll))
     (my-mapcat f (rest coll))))))

(defn subsets
  "All the subsets of items"
  [items]
  (my-mapcat (fn [n] (clojure.math.combinatorics/combinations items n))
  (range (inc (count items)))))

 (last (subsets (range 50))) ;; this will take hours to compute, good luck with it!

edited Apr 29 '13 at 21:38

answered Apr 29 '13 at 01:44

DanLebrero

8,545
1
29
30

I expected the library to have constant memory usage. I can increased Xmx to 4gb, but this can barely handle 50 Elements in the range. I want to create subsets of sets with 1000+ elements. – Apr 29 '13 at 09:52
what version of java are you using? – DanLebrero Apr 29 '13 at 12:09
I'm running java version "1.6.0_45" on a Mac OS X in 64bit – Apr 29 '13 at 14:52

score 3 · Answer 2 · answered Apr 29 '13 at 21:42

You want to compute the power set of a set with 1000 elements? You know that's going to have 2^1000 elements, right? That is so large I can't even find a good way to describe how enormous it is. If you're trying to work with such a set, and you can do so lazily, your problem won't be memory: it will be computation time. Let's say you have a supercomputer with infinite memory, capable of processing a trillion items per nanosecond: that's 10^21 items processed per second, or about 10^29 items per year. Even this supercomputer will take much, much longer than the lifetime of the universe to work through the items of (subsets (range 1000)).

So I'd say, stop worrying about the memory usage of this collection, and work on an algorithm that doesn't involve walking through sequences with more elements than there are atoms in the universe.

You're absolutely right. Computing a big solution set with subsets will take ages and is not a viable option in a real application. However having a version of subsets that is not prone to constant memory usage would be important for applications with a more realistic use case, too. — , May 01 '13 at 09:30

score 2 · Answer 3 · edited May 23 '17 at 12:16

The problem is neither with apply, nor with concat, nor with mapcat.

dAni's answer, where he reimplements mapcat, does accidentally results in fixing the problem, but the reasoning behind it is not correct. Also, his answer points to an article, where the author says "I believe the problem lies in apply". This is clearly wrong, as I am about to explain below. Finally, the issue at hand is not related to this other one, where non-lazy evaluation is indeed caused by apply.

If you look closely, both dAni and the author of that article implement mapcat without the use of the map function. I will show in the next example that the issue is related to the way the map function is implemented.

To demonstrate that the issue is not related to either apply or concat see the following implementation of mapcat. It uses both concat and apply, still it achieves full laziness:

(defn map
  ([f coll]
     (lazy-seq
      (when-let [s (seq coll)]
        (cons (f (first s)) (map f (rest s)))))))

(defn mapcat [f & colls]
  (apply concat (apply map f colls)))

(defn range-announce! [x]
  (do (println "Returning sequence of" x)
      (range x)))

;; new fully lazy implementation prints only 5 lines
(nth (mapcat range-announce! (range)) 5)

;; clojure.core version still prints 32 lines
(nth (clojure.core/mapcat range-announce! (range)) 5)

The full laziness in the above code is achieved by reimplementing the map function. In fact mapcat is implemented exactly the same way as in clojure.core, yet it works fully lazy. The above map implementation is a bit simplified for the sake of the example, as it only supports a single parameter, but even implementing it with the whole variadic signature will work the same: full laziness. So we showed that the problem here is neither with apply nor with concat. Also, we showed that the real problem must be related to how the map function is implemented in clojure.core. Let's take a look at it:

(defn map
  ([f coll]
   (lazy-seq
    (when-let [s (seq coll)]
      (if (chunked-seq? s)
        (let [c (chunk-first s)
              size (int (count c))
              b (chunk-buffer size)]
          (dotimes [i size]
              (chunk-append b (f (.nth c i))))
          (chunk-cons (chunk b) (map f (chunk-rest s))))
        (cons (f (first s)) (map f (rest s))))))))

It can be seen that the clojure.core implementation is exactly the same as our "simplified" version before, except for the true branch of the if (chunked-seq? s) expression. Essentially clojure.core/map has a special case for handling input sequences which are chunked sequences.

Chunked sequences compromise laziness by evaluating in chunks of 32 instead of strictly one at a time. This becomes painfully evident when evaluating deeply nested chunked sequences, like in the case of subsets. Chunked sequences were introduced in Clojure 1.1, and many core functions were upgraded to recognize and process them differently, including map. The main purpose of introducing them was to improve performance in certain stream-processing scenarios, but arguably they make it significantly harder to reason about the laziness characteristics of a program. You can read up on chunked sequences here and here. Also check out this question here.

The real problem is that range returns a chunked seq, and is used internally by subsets. The fix recommended by David James patches subsets to unchunk the sequence created by range internally.

score 1 · Answer 4 · edited May 23 '17 at 12:22

1

This issue has been raised on the project's ticket tracker: Clojure JIRA: OutOfMemoryError with combinatorics/subsets. There, you can find a patch by Andy Fingerhut. It worked for me. Note that the patch is different than the mapcat variation suggested by another answer.

edited May 23 '17 at 12:22

Community

1
1

answered Jun 24 '13 at 12:11

David J.

31,569
22
122
174

score 0 · Answer 5 · answered Apr 29 '13 at 15:15

0

In the absence of command line arguments, the startup heap size parameters of a JVM are determined by various ergonomics

The defaults (JDK 6) are

   
   initial heap size    memory / 64
   maximum heap size    MIN(memory / 4, 1GB)

but you can force an absolute value using the -Xmx and -Xms args You can find more detail here

answered Apr 29 '13 at 15:15

sw1nn

7,278
1
26
36

1

Setting xmx isn't the issue. I try to understand why this function takes more memory than it should. – Apr 29 '13 at 16:36

Clojure lazy sequences in math.combinatorics results in OutOfMemory (OOM) Error

5 Answers5