8

Should cons be inside (lazy-seq ...)

(def lseq-in (lazy-seq (cons 1 (more-one))))

or out?

(def lseq-out (cons 1 (lazy-seq (more-one))))

I noticed

(realized? lseq-in)
        ;;; ⇒ false

(realized? lseq-out)
        ;;; ⇒ <err>
        ;;;   ClassCastException clojure.lang.Cons cannot be cast to clojure.lang.IPending  clojure.core/realized? (core.clj:6773)

All the examples on the clojuredocs.org use "out".

What are the tradeoffs involved?

event_jr
  • 17,467
  • 4
  • 47
  • 62

2 Answers2

8

You definitely want (lazy-seq (cons ...)) as your default, deviating only if you have a clear reason for it. clojuredocs.org is fine, but the examples are all community-provided and I would not call them "the docs". Of course, a consequence of how it's built is that the examples tend to get written by people who just learned how to use the construct in question and want to help out, so many of them are poor. I would refer instead to the code in clojure.core, or other known-good code.

Why should this be the default? Consider these two implementations of map:

(defn map1 [f coll]
  (when-let [s (seq coll)]
    (cons (f (first s))
          (lazy-seq (map1 f (rest coll))))))

(defn map2 [f coll]
  (lazy-seq
    (when-let [s (seq coll)]
      (cons (f (first s))
            (map2 f (rest coll))))))

If you call (map1 prn xs), then an element of xs will be realized and printed immediately, even if you never intentionally realize an element of the resulting mapped sequence. map2, on the other hand, immediately returns a lazy sequence, delaying all its work until an element is requested.

amalloy
  • 89,153
  • 8
  • 140
  • 205
  • line-seq has `cons` on the outside and has already caused some confusion http://stackoverflow.com/questions/15182702/why-is-line-seq-returning-clojure-lang-cons-instead-of-clojure-lang-lazyseq?rq=1 – event_jr Jun 10 '13 at 02:13
  • In fact, isn't it a bug for `line-seq` to consume the first line when no one has asked for it? – event_jr Jun 10 '13 at 06:24
  • 2
    I do think `line-seq` is "wrong", but it's less crucial that `line-seq` be fully lazy, since it doesn't call an arbitrary function like `map` does. My guess was that `line-seq` is so old that its code was written back when lazy sequences worked differently than they do now. You can see some interesting history of line-seq at http://dev.clojure.org/jira/browse/CLJ-222. – amalloy Jun 10 '13 at 06:58
7

With cons inside lazy-seq, the evaluation of the expression for the first element of your seq gets deferred; with cons on the outside, it's done right away and only the construction of the "rest" part of the seq is deferred. (So (rest lseq-out) will be a lazy seq.)

Thus, if computing the first element is expensive and it might not be needed at all, putting cons inside lazy-seq makes more sense. If the initial element is supplied to the lazy seq producer as an argument, it may make more sense to use cons on the outside (this is the case with clojure.core/iterate). Otherwise it doesn't make that much of a difference. (The overhead of creating a lazy seq object at the start is negligible.)

Clojure itself uses both approaches (although in the majority of cases lazy-seq wraps the whole seq-producing expression, which may not necessarily start with cons).

Michał Marczyk
  • 83,634
  • 13
  • 201
  • 212
  • 1
    Good point about when the initial element is supplied as an argument; That's the first time I've heard of that. – ToBeReplaced Jun 10 '13 at 00:46
  • Hmmm. I'm not sure about the initial element supplied argument. Wouldn't the calculated value be presented at almost the same speed whether inside `lazy-seq` or not? In fact, the documentation for `iterate` says it will return a lazy-seq, but an exception is thrown when I call `realized?` on it. Isn't that actually a bug? – event_jr Jun 10 '13 at 06:32
  • You want `iterate` not to compute `(f x)` unless it is necessary. Therefore the consing of `x` to the front needs to be separated from the call to `f` by a `lazy-seq` layer. Thus, you could either define it as `(cons x (lazy-seq ...))` or `(lazy-seq (cons x (lazy-seq ...)))`; but then in the second case you end up with double the number of `lazy-seq` layers. As for `realized?`, that's a good point -- perhaps the docstring could use improvement. (Or you could argue that `realized?` should simply return `true` for non-lazy seqs, although my first instinct is that that's not the right way... Hm.) – Michał Marczyk Jun 10 '13 at 06:42
  • I still don't understand the second case. It may be a fundamental gap in my knowledge, but I'm playing around with `(defn iterate2 [f x] (lazy-seq (cons x (do (println "... iterate called") (iterate2 f (f x))))))`, and (f x) is not being called when I call `first` on the lazy seq. – event_jr Jun 10 '13 at 15:39
  • I have just now pasted your definition into a 1.5.1 REPL and entered `(first (iterate2 inc 0))`. In result, I got an "... iterate called" printout and a return value of 0, as expected. In contrast, `(first (iterate #(do (println "...") (inc %)) 0))` returns 0 without printing anything. – Michał Marczyk Jun 10 '13 at 15:47