The problem is neither with apply
, nor with concat
, nor with mapcat
.
dAni's answer, where he reimplements mapcat
, does accidentally results in fixing the problem, but the reasoning behind it is not correct. Also, his answer points to an article, where the author says "I believe the problem lies in apply". This is clearly wrong, as I am about to explain below. Finally, the issue at hand is not related to this other one, where non-lazy evaluation is indeed caused by apply
.
If you look closely, both dAni and the author of that article implement mapcat
without the use of the map
function. I will show in the next example that the issue is related to the way the map
function is implemented.
To demonstrate that the issue is not related to either apply
or concat
see the following implementation of mapcat
. It uses both concat
and apply
, still it achieves full laziness:
(defn map
([f coll]
(lazy-seq
(when-let [s (seq coll)]
(cons (f (first s)) (map f (rest s)))))))
(defn mapcat [f & colls]
(apply concat (apply map f colls)))
(defn range-announce! [x]
(do (println "Returning sequence of" x)
(range x)))
;; new fully lazy implementation prints only 5 lines
(nth (mapcat range-announce! (range)) 5)
;; clojure.core version still prints 32 lines
(nth (clojure.core/mapcat range-announce! (range)) 5)
The full laziness in the above code is achieved by reimplementing the map
function. In fact mapcat
is implemented exactly the same way as in clojure.core
, yet it works fully lazy. The above map
implementation is a bit simplified for the sake of the example, as it only supports a single parameter, but even implementing it with the whole variadic signature will work the same: full laziness. So we showed that the problem here is neither with apply
nor with concat
. Also, we showed that the real problem must be related to how the map
function is implemented in clojure.core
. Let's take a look at it:
(defn map
([f coll]
(lazy-seq
(when-let [s (seq coll)]
(if (chunked-seq? s)
(let [c (chunk-first s)
size (int (count c))
b (chunk-buffer size)]
(dotimes [i size]
(chunk-append b (f (.nth c i))))
(chunk-cons (chunk b) (map f (chunk-rest s))))
(cons (f (first s)) (map f (rest s))))))))
It can be seen that the clojure.core
implementation is exactly the same as our "simplified" version before, except for the true
branch of the if (chunked-seq? s)
expression. Essentially clojure.core/map
has a special case for handling input sequences which are chunked sequences.
Chunked sequences compromise laziness by evaluating in chunks of 32 instead of strictly one at a time. This becomes painfully evident when evaluating deeply nested chunked sequences, like in the case of subsets
. Chunked sequences were introduced in Clojure 1.1, and many core functions were upgraded to recognize and process them differently, including map
. The main purpose of introducing them was to improve performance in certain stream-processing scenarios, but arguably they make it significantly harder to reason about the laziness characteristics of a program. You can read up on chunked sequences here and here. Also check out this question here.
The real problem is that range
returns a chunked seq, and is used internally by subsets
. The fix recommended by David James patches subsets
to unchunk the sequence created by range
internally.