How to reason about space complexity in Haskell

Question

I'm trying to find a formal way to think about the space complexity in haskell. I have found this article about the Graph Reduction (GR) technique which seems to me as a way to go. But I'm having problems applying it in some cases. Consider the following example:

Say we have a binary tree:

data Tree = Node [Tree] | Leaf [Int]

makeTree :: Int -> Tree
makeTree 0 = Leaf [0..99]
makeTree n = Node [ makeTree (n - 1)
                  , makeTree (n - 1) ]

and two functions that traverse the tree, one (count1) which streams nicely and the the other one (count2) that creates the whole tree in memory at once; according to the profiler.

count1 :: Tree -> Int
count1 (Node xs) = 1 + sum (map count1 xs)
count1 (Leaf xs) = length xs

-- The r parameter should point to the root node to act as a retainer.
count2 :: Tree -> Tree -> Int
count2 r (Node xs) = 1 + sum (map (count2 r) xs)
count2 r (Leaf xs) = length xs

I think I understand how it works in the case of count1, here is what I think happens in terms of graph reduction:

count1 $ makeTree 2
=> 1 + sum $ map count1 xs
=> 1 + sum $ count1 x1 : map count1 xs
=> 1 + count1 x1                                + (sum $ map count1 xs)
=> 1 + (1 + sum $ map count1 x1)                + (sum $ map count1 xs)
=> 1 + (1 + sum $ (count1 x11) : map count1 x1) + (sum $ map count1 xs)
=> 1 + (1 + count1 x11 + sum $ map count1 x1)   + (sum $ map count1 xs)
=> 1 + (1 + count1 x11 + sum $ map count1 x1)   + (sum $ map count1 xs)
=> 1 + (1 + 100 + sum $ map count1 x1)          + (sum $ map count1 xs)
=> 1 + (1 + 100 + count x12)                    + (sum $ map count1 xs)
=> 1 + (1 + 100 + 100)                          + (sum $ map count1 xs)
=> 202                                          + (sum $ map count1 xs)
=> ...

I think it's clear from the sequence that it runs in constant space, but what changes in case of the count2?

I understand smart pointers in other languages so I vaguely understand that the extra r parameter in count2 function somehow keeps nodes of the tree from beeing destroyed, but I would like to know the exact mechanism, or at least a formal one which I could use in other cases as well.

Thanks for looking.

Can you show how you call count2? Your comment indicates that you do something like: let t = makeTree 2 in count2 t t — Ingo, Apr 05 '11 at 13:27
@lngo, yes, [here](http://hpaste.org/45318/how_to_reason_about_space_comp) is the code I use for testing. — Peter Jankuliak, Apr 05 '11 at 13:33
@sclv, I like to think that I understand the relevant parts about GCs, but maybe not, could you please be more specific? E.g. if I was working in Java I understand that the tree would not be destroyed as long as I keep the root node stored in some var. What I don't get is the Haskell specific part about it. E.g. in case of _count1_, the caller also retains the pointer to the root node, but that is not stopping it from destroying the intermediate nodes... — Peter Jankuliak, Apr 05 '11 at 15:16
@Peter, after having seen your code I have a possible explanation for your observation: In the count1 example, even a dumb compiler could figure that the let bounfd variable that points to the root node is not really needed and could inline it at the single point of usage. However, to determine if count2 actually will use its 1st argument is much harder - of course the compiler *could* rewrite count2 r r with count2 () (makeTree 2). Don't know if this helps but in any case you should not reason about space behaviour of the original code but of post-optimizer code. — Ingo, Apr 05 '11 at 15:28
@FUZxxl, why do you think it is not constant? (here are the images from profiling: [count1](http://imgur.com/YWYtyl&q35Y9) and [count2](http://imgur.com/YWYty&q35Y9l)). Maybe you mean that the total number of allocations isn't O(1)? If so, that is all right because all those allocations happend only in small space. — Peter Jankuliak, Apr 05 '11 at 15:42
@Peter -- where in *count1* does the caller retain the pointer to the root node? As soon as the pattern match is forced, there's no reference left anywhere to the `Node` constructor, just to its contents. — sclv, Apr 05 '11 at 16:34
The fact that count2 doesn't run in (almost) constant space is not a property of Haskell, but of a particular Haskell implementation. It would be perfectly reasonable to garbage collect the tree even for count2, but it requires the compiler to prove that the first argument to count2 is never reachable. This is a bit tricky. — augustss, Apr 05 '11 at 16:36
@sclv, I meant the caller to be the main function. Which does retains (indeed as you say) pointer to the root if you run count1 twice in a row with the same argument. I mean, I know you are right, just that I believe there is more to the story then just GCs. An exaggerated analogy would be: even if you know how GCs work in Java, it wont tell you the full story about how Eclipse manages it's memory. — Peter Jankuliak, Apr 05 '11 at 18:39
@Peter -- the let binding in the harness code shouldn't make a difference to GC. The issue is not if the name is in scope, it is if the object is reachable. Many more details in SPJ's two books: http://research.microsoft.com/en-us/um/people/simonpj/papers/papers.html — sclv, Apr 05 '11 at 19:05

score 7 · Accepted Answer · answered Apr 05 '11 at 15:37

You could use Adam Bakewell's space semantics,

A Space Semantics for Core Haskell, 2000.

Haskell currently lacks a standard operational semantics. We argue that such a semantics should be provided to enable reasoning about operational properties of programs, to ensure that implementations guarantee certain space and time behaviour and to help determine the source of space faults. We present a small-step deterministic semantics for the sequential evaluation of Core Haskell programs and show that it is an accurate model of asymptotic space and time usage. The semantics is a formalisation of a graphical notation so it provides a useful mental model as well as a precise mathematical notation. We discuss its implications for education, programming and implementation. The basic semantics is extended with a monadic IO mechanism so that all the space under the control of an implementation is included.

Or work from the Coq specification of the STG machine.

thank you, the first link seems to address directly my question. — Peter Jankuliak, Apr 05 '11 at 19:17
Note to self: [Write Haskell as fast as C: exploiting strictness, laziness and recursion](http://cgi.cse.unsw.edu.au/~dons/blog/2008/05/16#fast), @Don, I believe the credit goes to you. — Peter Jankuliak, Apr 06 '11 at 17:59

How to reason about space complexity in Haskell

1 Answers1