Tail optimization guarantee - loop encoding in Haskell

Question

So the short version of my question is, how are we supposed to encode loops in Haskell, in general? There is no tail optimization guarantee in Haskell, bang patterns aren't even a part of the standard (right?), and fold/unfold paradigm is not guaranteed to work in all situation. Here's case in point were only bang-patterns did the trick for me of making it run in constant space (not even using $! helped ... although the testing was done at Ideone.com which uses ghc-6.8.2).

It is basically about a nested loop, which in list-paradigm can be stated as

prod (sum,concat) . unzip $ 
    [ (c, [r | t]) | k<-[0..kmax], j<-[0..jmax], let (c,r,t)=...]
prod (f,g) x = (f.fst $ x, g.snd $ x)

Or in pseudocode:

let list_store = [] in
for k from 0 to kmax
    for j from 0 to jmax
        if test(k,j) 
            list_store += [entry(k,j)]
        count += local_count(k,j)
result = (count, list_store)

Until I added the bang-pattern to it, I got either a memory blow-out or even a stack overflow. But bang patterns are not part of the standard, right? So the question is, how is one to code the above, in standard Haskell, to run in constant space?

Here is the test code. The calculation is fake, but the problems are the same. EDIT: The foldr-formulated code is:

testR m n = foldr f (0,[]) 
               [ (c, [(i,j) | (i+j) == d ])
                 | i<- [0..m], j<-[0..n], 
                   let c = if (rem j 3) == 0 then 2 else 1 ]
  where d = m + n - 3
    f (!c1, [])     (!c, h) = (c1+c,h) 
    f (!c1, (x:_))  (!c, h) = (c1+c,x:h)

Trying to run print $ testR 1000 1000 produces stack overflow. Changing to foldl only succeeds if using bang-patterns in f, but it builds the list in reversed order. I'd like to build it lazily, and in the right order. Can it be done with any kind of fold, for the idiomatic solution?

EDIT: to sum up the answer I got from @ehird: there's nothing to fear using bang pattern. Though not in standard Haskell itself it is easily encoded in it as f ... c ... = case (seq c False) of {True -> undefined; _ -> ...}. The lesson is, only pattern match forces a value, and seq does NOT force anything by itself, but rather arranges that when seq x y is forced - by a pattern match - x will be forced too, and y will be the answer. Contrary to what I could understand from the Online Report, $! does NOT force anything by itself, though it is called a "strict application operator".

And the point from @stephentetley - strictness is very important in controlling the space behaviour. So it is perfectly OK to encode loops in Haskell with proper usage of strictness annotations with bang patterns, where needed, to write any kind of special folding (i.e. structure-consuming) function that is needed - like I ended up doing in the first place - and rely on GHC to optimize the code.

Thank you very much to all for your help.

I'm not sure if there is a "general" way, as quite often the correct way is to work out what the problem requires rather than translating imperative code... — ivanm, Feb 05 '12 at 12:31
@ivanm surely a language should have means to encode such a basic construct as loop, or an equivalent? — Will Ness, Feb 05 '12 at 12:33
It is possible to iterate through data, etc.; but generally when people ask "how do I write a loop in Haskell", it's because they're trying to do a line-by-line port of some imperative code, which is usually not the best way of doing it. Also, what makes the notion of a loop such a "basic construct"? I would argue that the loop is just the imperative viewpoint of iterating through data... — ivanm, Feb 05 '12 at 12:40
@ivanm creation of data by iteration through generated data in constant space ... or "loop", whatever the name we choose for it. I myself rather prefer the tail-recursive formulations where possible, but that's just syntax. Iterative process i.e. such that is running in constant space, surely is a basic notion of computation (re SICP etc). E.g. in Prolog the only way to encode a loop is through tail recursion. But Haskell has no tail recursion, so *how* one is to write loops in it? — Will Ness, Feb 05 '12 at 12:58
@Will Ness. If you write tail recursive code in Haskell, GHC will perform tail call optimization on it. Haskell has as much tail recursion as OCaml, Scheme etc. you just have to make sure you actually write tail recursive functions (newcomers usually miss out the accumulator and just put the recursive call last). Also with Haskell being a lazy language, you might commonly want laziness rather than tail recursive (strict) operations. — stephen tetley, Feb 05 '12 at 13:16
@stephentetley that's the problem, it didn't, not until I used bang patterns. Here's [the code](http://ideone.com/uSyGI). — Will Ness, Feb 05 '12 at 14:12
@Will Ness - you would get tail call optimization from the compiler (provided you compiled with optimization and wrote genuinely tail recursive code), but if you were lazily building the accumulator you would get a big thunk. TCO and forcing are different things - for _strict_ programming in Haskell you may need both. — stephen tetley, Feb 05 '12 at 14:20
@stephentetley the whole point is that there is no tail call optimization guarantee in Haskell standard, and people usually say "write idiomatic code using `fold` instead!" but here no built-in fold is working (`foldl` builds the list in reversed order, `foldr` blows the stack up), and so I must write my own special kind of fold, and for that I **have** to have TCO ([*modulo cons*](http://rosettacode.org/wiki/Hamming_numbers#Direct_calculation_through_triples_enumeration) to be more precise). — Will Ness, Feb 05 '12 at 19:28
@Will Ness - you are splitting a very small semantic hair claiming "Haskell has no tail recursion" - presumably you meant TCO anyway? - and saying that that the standard makes no guarantee of of TCO when GHC (commonly synonymous with "Haskell" unless differentiated otherwise) performs TCO when you compile with optimizations. I'll concede then, that "Haskell" has less TCO than Scheme but maintain that it is still comparable to OCaml. — stephen tetley, Feb 05 '12 at 20:50
@stephentetley I "claim" what I've been told on IRC, nothing more (I've read it somewhere too but don't remember where). There is no guarantee of tail call optimization in Haskell standard. And that makes me feel uneasy. If GHC has such guarantee when compiling with -O2, it would be very good news indeed. But what I usually hear is that it "will usually perform the TCO". I'd like to know if there is a guarantee for it, please. — Will Ness, Feb 05 '12 at 21:05
@DanBurton yes, but then the list is created in reversed order. — Will Ness, Feb 05 '12 at 21:06
@stephentetley and it's more complicated than just plain TCO, it's TCO *modulo cons* that we usually need. It's the constant space operation that is the real goal, of course, whatever it's called. — Will Ness, Feb 05 '12 at 21:13
@WillNess you could always use [Data.Sequence](http://hackage.haskell.org/packages/archive/containers/latest/doc/html/Data-Sequence.html) and create the list by appending to the end instead. This would, of course, be strict rather than lazy. Looking into it further... — Dan Burton, Feb 05 '12 at 21:19
@Will Ness - from what I read on Wikipedia "TCO modulo cons" is an implementation strategy in David Warren's Prolog that generalizes TCO a bit. As far as I know you don't have this in Haskell (nor in Scheme, OCaml...) so you can't use it - you have to code you algorithm in tail recursive style sans cons. You also have to be mindful of the strictness of your accumulator (a separate concern to TCO). If you are using pairs, which your Hamming code appears to be doing, in the recursive component you have to be very careful about the "Walder pair space leak". — stephen tetley, Feb 05 '12 at 21:26
@stephentetley I use it as a metaphor mostly. It is very similar - almost synonymous actually - to the guarded recursion, so we do have this in Haskell, or so I hope. :) Will search for "Walder pair space leak", thanks for the pointer. :) — Will Ness, Feb 05 '12 at 21:31
@WillNess So... How much of this comment thread can be cleaned up/moved into the question/answers? It's getting quite crowded and most of it is from you. Would you please do some housecleaning? — casperOne, Feb 06 '12 at 16:19
@WillNess Is what you wrote in the comments directly related to your question? If so, edit the question to reflect that. If not, then delete your comments (click the little red `x` next to the comment when you hover over it). — casperOne, Feb 06 '12 at 20:15

ehird · Accepted Answer · 2012-02-05T13:43:51.857

Bang patterns are simply sugar for seq — whenever you see let !x = y in z, that can be translated into let x = y in x `seq` z. seq is standard, so there's no issue with translating programs that use bang patterns into a portable form.

It is true that Haskell makes no guarantees about performance — the report does not even define an evaluation order (only that it must be non-strict), let alone the existence or behaviour of a runtime stack. However, while the report doesn't specify a specific method of implementation, you can certainly optimise for one.

For example, call-by-need (and thus sharing) is used by all Haskell implementations in practice, and is vital for optimising Haskell code for memory usage and speed. Indeed, the pure memoisation trick¹ (as relies on sharing (without it, it'll just slow things down).

This basic structure lets us see, for example, that stack overflows are caused by building up too-large thunks. Since you haven't posted your entire code, I can't tell you how to rewrite it without bang patterns, but I suspect [ (c, [r | t]) | ... ] should become [ c `seq` r `seq` t `seq` (c, [r | t]) | ... ]. Of course, bang patterns are more convenient; that's why they're such a common extension! (On the other hand, you probably don't need to force all of those; knowing what to force is entirely dependent on the specific structure of the code, and wildly adding bang patterns to everything usually just slows things down.)

Indeed, "tail recursion" per se does not mean all that much in Haskell: if your accumulator parameters aren't strict, you'll overflow the stack when you later try to force them, and indeed, thanks to laziness, many non-tail-recursive programs don't overflow the stack; printing repeat 1 won't ever overflow the stack, even though the definition — repeat x = x : repeat x — clearly has recursion in a non-tail position. This is because (:) is lazy in its second argument; if you traverse the list, you'll have constant space usage, as the repeat x thunks are forced, and the previous cons cells are thrown away by the garbage collector.

On a more philosophical note, tail-recursive loops are generally considered suboptimal in Haskell. In general, rather than iteratively computing a result in steps, we prefer to generate a structure with all the step-equivalents at the leaves, and do a transformation (like a fold) on it to produce the final result. This is a much higher-level view of things, made efficient by laziness (the structure is built up and garbage-collected as it's processed, rather than all at once).²

This can take some getting used to at first, and it certainly doesn't work in all cases — extremely complicated loop structures might be a pain to translate efficiently³ — but directly translating tail-recursive loops into Haskell can be painful precisely because it isn't really all that idiomatic.

As far as the paste you linked to goes, id $! x doesn't work to force anything because it's the same as x `seq` id x, which is the same as x `seq` x, which is the same as x. Basically, whenever x `seq` y is forced, x is forced, and the result is y. You can't use seq to just force things at arbitrary points; you use it to cause the forcing of thunks to depend on other thunks.

In this case, the problem is that you're building up a large thunk in c, so you probably want to make auxk and auxj force it; a simple method would be to add a clause like auxj _ _ c _ | seq c False = undefined to the top of the definition. (The guard is always checked, forcing c to be evaluated, but always results in False, so the right-hand side is never evaluated.)

Personally, I would suggest keeping the bang pattern you have in the final version, as it's more readable, but f c _ | seq c False = undefined would work just as well too.

¹ See Elegant memoization with functional memo tries and the data-memocombinators library.

² Indeed, GHC can often even eliminate the intermediate structure entirely using fusion and deforestation, producing machine code similar to how the computation would be written in a low-level imperative language.

³ Although if you have such loops, it's quite possible that this style of programming will help you simplify them — laziness means that you can easily separate independent parts of a computation out into separate structures, then filter and combine them, without worrying that you'll be duplicating work by making intermediate computations that will later be thrown away.

thanks, I'll read your answer closely but in the meantime, have you had a look at [that code](http://ideone.com/AkbN1) that I linked? I couldn't make it run in constant space with just ($!). `case id $! (c+i+1) of { c' -> ...` didn't work until I added bang pattern to `!c' -> ...` specifically. — Will Ness, Feb 05 '12 at 13:34
also please note, I provided links to two full versions of code in question. — Will Ness, Feb 05 '12 at 13:36
about `repeat x = x : repeat x`, the frequently repeated assertion that it is not tail recursive is not quite right IMO; more appropriate is to see it as being *tail recursive modulo cons* which under lazy evaluation IMO is synonymous with *corecursion* - what I referred to as fold/unfold paradigm (is it *hylomorphism*? I don't remember). That is fine, and *if* it were guaranteed to always run in constant space where appropriate, I'd be happy indeed. But my code refused to run in constant space in such formulation too, for nested loop. With one comprehension it did run in constant space. — Will Ness, Feb 05 '12 at 13:44
@WillNess: I've updated my answer to explain what's wrong with the code you linked. Tail recursion modulo cons basically *is* guaranteed to work if you traverse the result correctly in GHC, though, so there must be some other problem with your original code. But your final solution seems fine to me. — ehird, Feb 05 '12 at 13:47
about "creating structure and transforming it", that is what I started from, with `prod (sum,concat) . unzip $ ...`. But laziness did *not* make it efficient, as I was hoping, and gc did *not* throw away unneeded cells. The memory footprint was large, and growing. Basically, all the empty lists were retained which should have been consumed by `concat`, but weren't. Please do take a look at the [full code](http://rosettacode.org/wiki/Hamming_numbers#Direct_calculation_through_triples_enumeration) to which I [linked above](http://ideone.com/AkbN1). — Will Ness, Feb 05 '12 at 13:50
@WillNess: You haven't shown the original structure-based version of the code, so I can't really comment on what could be wrong with it, but it's likely that your fold was simply insufficiently strict. — ehird, Feb 05 '12 at 13:56
thanks a lot! for your suggestion, it would indeed work in standard Haskell and looks equivalent to using bang patterns. I would rather prefer more idiomatic version, as you also suggest, similar to that with which I started: `prod (sum,concat) . unzip $ [ (c, [r | t]) | k<-[0..x], j<-[0..y], let ...]`. Could you think of a way *it* could be done, instead of the bothersome functions encoding I ended up with? Thanks a lot! — Will Ness, Feb 05 '12 at 13:57
It was just like I quoted here, `prod(sum,concat) . unzip $ ...`. I think I tried `fold` version too, will search for it and get back to you. Thanks. — Will Ness, Feb 05 '12 at 13:58
Did you compile it with `-O2`? Also, what type does `c` have? Failing that, the problem may be in the `...`; could you add a complete version of the code to your question so I can test it? — ehird, Feb 05 '12 at 14:01
as far as I remember the `foldl'` solution worked too, just **if** I used it with a function with bang patterns - but it built the list in the reverse order. I would prefer it if it was standard - i.e. without (!), and built the list in its true order. Will search for a specific link. — Will Ness, Feb 05 '12 at 14:02
It's `Int`. Here is [the test code](http://ideone.com/uSyGI). The calculation is fake, but the problems are the same. — Will Ness, Feb 05 '12 at 14:06

score 2 · Answer 2 · answered Feb 05 '12 at 21:38

OK let's work from the ground up here.

You have a list of entries

entries = [(k,j) | j <- [0..jmax], k <- [0..kmax]]

And based on those indexes, you have tests and counts

tests m n = map (\(k,j) -> j + k == m + n - 3) entries
counts = map (\(_,j) -> if (rem j 3) == 0 then 2 else 1) entries

Now you want to build up two things: a "total" count, and the list of entries that "pass" the test. The problem, of course, is that you want to generate the latter lazily, while the former (to avoid exploding the stack) should be evaluated strictly.

If you evaluate these two things separately, then you must either 1) prevent sharing entries (generate it twice, once for each calculation), or 2) keep the entire entries list in memory. If you evaluate them together, then you must either 1) evaluate strictly, or 2) have a lot of stack space for the huge thunk created for the count. Option #2 for both cases is rather bad. Your imperative solution deals with this problem simply by evaluating simultaneously and strictly. For a solution in Haskell, you could take Option #1 for either the separate or the simultaneous evaluation. Or you could show us your "real" code and maybe we could help you find a way to rearrange your data dependencies; it may turn out you don't need the total count, or something like that.

thanks for answering. I've linked to the real code in the question, it is [here](http://rosettacode.org/wiki/Hamming_numbers#Direct_calculation_through_triples_enumeration). I was just uneasy using bang patterns, since they are not in standard, but ehird shown me the way how they can be encoded in the standard Haskell, so it's OK now. I ended up writing a special kind of fold there, and GHC did OK with it. — Will Ness, Feb 05 '12 at 22:21

Tail optimization guarantee - loop encoding in Haskell

2 Answers2

Linked