I didn't find the current answer clear enough, so here goes...
sequence
does return a LazySeq, but it is a chunked one, so when you play around with it in the REPL, you will often have the impression it is eager, because your collection will probably be too small, and the chunking will make it look eager. The chunk size I think is a bit dynamic, and it won't always be exactly the same size chunks, but in general it seems to be of size 32. So your transducer will be applied to the input collection 32 elements at a time, lazily.
Here's a simple transducer that just prints the elements it reduces over and returns them untouched:
(defn printer
[xf]
(fn
([] (xf))
([result] (xf result))
([result input]
(println input)
(xf result input))))
If we create a sequence s
of 100 elements with it:
(def s
(sequence
printer
(range 100)))
;;> 0
We see that it prints 0
, but nothing else. On the call to sequence
, the first element will thus be consumed from (range 100)
, and it will be passed to the xf
chain to be transformed, which in our case just prints it. No other elements except the first one have thus been consumed yet.
Now if we take one element from s
:
(take 1 s)
;;> 0
;;> 1
;;> 2
;;> 3
;;> 4
;;> 5
;;> 6
;;> 7
;;> 8
;;> 9
;;> 10
;;> 11
;;> 12
;;> 13
;;> 14
;;> 15
;;> 16
;;> 17
;;> 18
;;> 19
;;> 20
;;> 21
;;> 22
;;> 23
;;> 24
;;> 25
;;> 26
;;> 27
;;> 28
;;> 29
;;> 30
;;> 31
;;> 32
We see that it printed the first 32 elements. This is the normal behavior of chunked lazy sequence in Clojure. You can think of it as semi-lazy, in that it consumes chunk-size elements at a time, instead of 1 at a time.
Now if we try to take any element from 1 to 32, nothing else will be printed, because the first 32 elements have already been processed:
(take 1 s)
;; => (0)
(take 10 s)
;; => (0 1 2 3 4 5 6 7 8 9)
(take 24 s)
;; => (0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23)
(take 32 s)
;; => (0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31)
Nothing gets printed, and each take returns the expected set of result. I'm using ;; =>
for return values, and ;;>
for printed output.
Okay, now if we take the 33rd element, we expect to see the next chunk of 32 elements being printed:
(take 33 s)
;;> 33
;;> 34
;;> 35
;;> 36
;;> 37
;;> 38
;;> 39
;;> 40
;;> 41
;;> 42
;;> 43
;;> 44
;;> 45
;;> 46
;;> 47
;;> 48
;;> 49
;;> 50
;;> 51
;;> 52
;;> 53
;;> 54
;;> 55
;;> 56
;;> 57
;;> 58
;;> 59
;;> 60
;;> 61
;;> 62
;;> 63
;;> 64
Awesome! So once more, we see that only the next 32 were taken, which brings us to a total of 64 elements now processed.
Well, this demonstrates that sequence
called with a transducer does in fact creates a lazy chunked sequence where elements will only be processed when needed (chunk-size at a time).
So what's this about?:
The resulting sequence elements are incrementally computed. These sequences will consume input incrementally as needed and fully realize intermediate operations. This behavior differs from the equivalent operations on lazy sequences.
This is about the order in which the operations happen. With sequence
and a transducer:
(sequence (comp A B C) coll)
Will for each elements in the chunk have them go through: A -> B -> C
, so you get:
A(e1) -> B(e1) -> C(e1)
A(e2) -> B(e2) -> C(e2)
...
A(e32) -> B(e32) -> C(e32)
While for a normal lazy seq like:
(->> coll A B C)
Will first have all chunked elements go through A, and then have them all go through B and then C:
A(e1)
A(e2)
...
A(e32)
|
B(e1)
B(e2)
...
B(e32)
|
C(e1)
C(e2)
...
C(e32)
This requires an intermediate collection between each step, as the result of A have to be collected into a collection to then loop over and apply B, etc.
We can see this with our previous example:
(def s
(sequence
(comp (filter odd?)
printer
(map vector)
printer)
(range 10)))
(take 1 s)
;;> 1
;;> [1]
;;> 3
;;> [3]
;;> 5
;;> [5]
;;> 7
;;> [7]
;;> 9
;;> [9]
(def l
(->> (range 10)
(filter odd?)
(map #(do (println %) %))
(map vector)
(map #(do (println %) %))))
(take 1 l)
;;> 1
;;> 3
;;> 5
;;> 7
;;> 9
;;> [1]
;;> [3]
;;> [5]
;;> [7]
;;> [9]
See how the first will filter -> vector -> filter -> vector, etc.
While the second will filter all -> vector all
. Well this is what the quote from the doc means.
Now one more thing, there is a difference in how the chunking is applied as well between the two. With sequence
and a transducer, it will process elements until the transducer result has chunk-size count of elements. While in the lazy-seq case, it will process in chunks at each level until all steps have enough for what they need to do.
Here's what I mean:
(def s
(sequence
(comp printer
(filter odd?))
(range 100)))
(take 1 s)
;;> 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
(def l
(->> (range 100)
(map #(do (print % "") %))
(filter odd?)))
(take 1 l)
;;> 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
Here I modified the printing logic to be on the same line, so it doesn't take as much space. And if you look closely, s
processed 66 elements of the input range, while l
only consumed 32 elements.
The reason for this is what I said above. With sequence
, we will continue taking in chunks until we have chunk-size number of results. In this case, the chunk-size is 32, and since we filter on odd?
, it takes us two chunks to reach 32 results.
With lazy-seq, it doesn't try and grab the first chunk of results, but only enough chunks from the input to satisfy the logic, in this case, that only needs one chunk of 32 elements from the input for us to find a single odd number to take.