16
user=> (def r (range 1))
user=> (for [a r, b r, c r, d r, e r, f r, g r, h r :when (and (= 0 a) (not= 1 b))]
          (list a b c d e f g h))
((0 0 0 0 0 0 0 0))
user=> (doseq [a r, b r, c r, d r, e r, f r, g r, h r :when (and (= 0 a) (not= 1 b))]
          (println (list a b c d e f g h)))
CompilerException java.lang.RuntimeException: Method code too large!, compiling:(/tmp/form-init8346140986526777871.clj:1:1)

This appears to come from clojure.asm.MethodWriter. My Googling for this error with Clojure turns up almost no hits.

So...what on earth is going on? How deep does this rabbit hole go? Is this one line of Clojure code really producing a >65KB method (the value comes from MethodWriter's source)?

If this answer is hitting on the issue I'm running into, then (a) why does chunking mean it grows exponentially instead of linearly? And (b) what are the ramifications for me as a programmer? For example, is this behavior well-known and intended? Should I avoid using doseq for any situation with more than 3 or 4 bindings? How does this compare to using for and doall?

Perhaps related:

Clojure doseq generates huge code

Method code too large! exception using ASM

Community
  • 1
  • 1
galdre
  • 2,319
  • 17
  • 31
  • 2
    It looks like every extra binding in your doseq doubles the size of the generated code (I tried it with no.disassemble). Not sure why. – Diego Basch Oct 14 '14 at 07:06
  • +1 Wooha! Nice catch! I'm really interested to know more about that. – Chiron Oct 14 '14 at 08:59
  • My naive expectation would be that between `for` and `doseq`, it would be `for` that would produce the larger method. If `doseq` has this kind of inefficiency underneath, then I'd be interested in knowing at what threshold of complexity you'd get better performance going with `(def ignore (doall (for [stuff] (side-effect)))` instead of the more natural `(doseq [stuff] (side-effect))`. – galdre Oct 14 '14 at 12:20
  • @DiegoBasch, I just checked out no.disassemble. That's very cool. Thanks for the reference. :) – galdre Oct 14 '14 at 12:22
  • The accepted answer in the first question you linked explains why this is happening. The macroexpansion for doseq has a branch to handle chunked seqs for each bound seq expression, meaning the expanded code doubles in size for each new binding. – Alex Oct 14 '14 at 13:20
  • @Alex Honestly, I didn't find that answer to be super helpful. He showed that `doseq` generates more code (the OP already knew this), and that this is related to chunking. Why does chunking mean it grows exponentially instead of linearly? Is this behavior well-known and intended? Does it come with an advantage over using `for` and `doall`? – galdre Oct 14 '14 at 13:56
  • @galdre If you find the accepted answer to a question that's almost an exact duplicate of yours to be unsatisfactory, then perhaps you should edit your question to highlight your specific concerns. – Alex Oct 14 '14 at 14:32
  • Alright -- I'll do that. For the record, though, it wasn't exactly obvious to me that the two questions were duplicates, which is why I listed it as "perhaps related." I'm afraid my understanding of Clojure under the hood is pretty scant. – galdre Oct 14 '14 at 15:33
  • @Alex Should I mark this as a duplicate? I am not satisfied with the answer to the original question (and I think my question is more concise and easier to find), but I'm not sure what the proper SO flow is in this case. A bounty on the original question? – galdre Oct 14 '14 at 16:52
  • I think it's fine to leave this question open. The question in its edited form is different enough from the original. – Alex Oct 14 '14 at 16:54

2 Answers2

9

What you're seeing is a nasty side effect of an optimization that was put into the implementation of the doseq macro to handle chunked sequences in the input. The answer of the question that you linked correctly describes the underlying cause, but doesn't shed a whole lot of light on why things happen the way they do.

The doseq implementation internally uses a function that recursively builds up a series of nested loop constructs, one loop for each level of bindings in the doseq. In a naive, unoptimized version of this implementation, the loop at each level would simply run its body and then call recur with the next value for its seq. Something along these lines:

(loop [s (seq input)]
  (if s
    (do (run-body (first s))
        (recur (next s)))))

If that seq happens to be a chunked sequence, though, this will cause the unnecessary creation of lots of intermediate seq objects that are never used outside the body of the loop. The optmization that doseq has made is to put an if inside the loop with one branch to handle chunked sequences and one to handle non-chunked sequences. The loop body is duplicated between each branch. If the loop body happens to be a nested loop, then you can see how the exponential increase in code size happens - the loop at each level of the expanded code has two child loops.

So, to answer your question, I wouldn't exactly say the explosion in code size is intended, but it's a consequence of the designed behavior of doseq. It just wasn't designed to handled deeply nested loops, and in the wild I've never seen it used with more than one or two levels of bindings.

You can reproduce the semantics of a deeply nested doseq with a combination of for and dorun (don't used doall as this unnecessarily retains the head of the seq). This will allow you to handle any level of nesting, with a slight but measurable performance hit if you happen to be running through a chunked sequence in a tight loop.

user> (time (doseq [x (range 10000) y (range 10000)] (* x y)))
"Elapsed time: 2933.543178 msecs"

user> (time (dorun (for [x (range 10000) y (range 10000)] (* x y))))
"Elapsed time: 5560.90003 msecs"
Alex
  • 13,811
  • 1
  • 37
  • 50
1

I had a similar problem when I was creating my own compiler using java.

I declared a very large matrix. For me the solution was separated into small matrix. It's just a suggestion, maybe you can do something similar, such as:

(def r (range 1))
(defn foo [a b c d]
   (doseq [e r, f r, g r, h r] (println "Hi")))
(doseq [a r, b r, c r, d r :when (and (= 0 a) (not= 1 b))]
   (foo a b c d))
Shog9
  • 156,901
  • 35
  • 231
  • 235
Gustavo Rossi Muller
  • 1,062
  • 14
  • 18
  • 2
    I think your answer may be good but it would be helpful to provide more detail – StormeHawke Oct 14 '14 at 12:34
  • This does not really answer the question. If you have a different question, you can ask it by clicking [Ask Question](http://stackoverflow.com/questions/ask). You can also [add a bounty](http://stackoverflow.com/help/privileges/set-bounties) to draw more attention to this question once you have enough [reputation](http://stackoverflow.com/help/whats-reputation). – vyegorov Oct 14 '14 at 12:35
  • 4
    @vyegorov, actually his answer spells out all the information necessary to find a workaround. I'm not looking for a workaround exactly, but his answer was certainly more helpful than your stock pasted reviews. – galdre Oct 14 '14 at 13:22