1

I have the following Red Black tree:

data Tree a
  = E
  | S a
  | C !Color !(Tree a) !(Tree a)

data Color = R | B

In case of this tree, all the data are stored in the leaves (the S constructor). I have written an insert function like the standard Okasaki red black trees[1] (modifying the parts where the values are stored in the internal nodes)

In this cases I populate the tree with 10 million elements:

l = go 10000000 E
  where
    go 0 t = insert 0 t
    go n t = insert t $ go (n - 1) t

When I try to evaluate the left most element (leaf) of the tree like this:

left :: Tree a -> Maybe a
left E = Nothing
left (S x) = Just x
left (C _ _ l _) = left l

I encounter the following:

left l

*** Exception: stack overflow

Is this owing to the way that I am constructing the tree (non tail recursive) or is there some missing space leak that I cannot see.

Please note the function works fine for a million elements. Additionally I attempted a tail recursive way of the tree construction:

l = go 10000000 E
  where
    go 0 t = insert 0 t
    go n t = go (n - 1) (insert n t)

but encountered the same stack overflow exception.

[1] https://www.cs.tufts.edu/~nr/cs257/archive/chris-okasaki/redblack99.pdf

EDIT

The insert and balance function for completeness:

 insert :: Ord a => a -> Tree a -> Tree a
 insert x xs = makeBlack $ ins xs
   where
     ins E = S x
     ins (S a) = C R (S x) (S a)
     ins (C c l r) = balance c (ins l) r -- always traverse left and trust the balancing
 
     makeBlack (C _ l r) = C B l r
     makeBlack a = a
 
 balance :: Color -> Tree a -> Tree a -> Tree a
 balance B (C R (C R a b) c) d = C R (C B a b) (C B c d)
 balance B (C R a (C R b c)) d = C R (C B a b) (C B c d)
 balance B a (C R (C R b c) d) = C R (C B a b) (C B c d)
 balance B a (C R b (C R c d)) = C R (C B a b) (C B c d)
 balance color a b = C color a b

There was mistyping from my end while typing in the insert code, it is insert n $ go (n - 1) t and not insert t $ go (n - 1) t. However when actually encountering the stack overflow the code was correct and the overflow happened in ghci.

Community
  • 1
  • 1
Abhiroop Sarkar
  • 2,251
  • 1
  • 26
  • 45
  • 1
    Perhaps you should post an [MCVE](https://stackoverflow.com/help/mcve). – chi Jan 01 '19 at 13:20
  • @chi I have added the insert and balance to provide completeness. – Abhiroop Sarkar Jan 01 '19 at 13:30
  • I had to debug your code a bit to make it compile. After that, it's true that it allocates a lot of memory. I was able to run it with 10M elements (using GHC, not GHCi), but it took several GBs of RAM in the process. Perhaps it requires ~500B/element. One could use unboxing to reduce that overhead. I'm unsure about how much memory we can save here. – chi Jan 01 '19 at 14:14
  • @melpomene Added the Color type – Abhiroop Sarkar Jan 01 '19 at 14:49
  • @chi Yes in my machine, using dandiaz's solution the program allocates close to 14GB of memory. – Abhiroop Sarkar Jan 01 '19 at 15:17

1 Answers1

3

The first example of insertion code has a bug: it tries to insert the tree itself as an element.

The second version

l = go 10000000 L.empty   where
    go 0 t = L.cons 0 t
    go n t = go (n - 1) (L.cons n t)

Is indeed tail recursive, but it still has a problem: it doesn't at any step "force" the tree while it is being constructed. Due to Haskell's laziness, go will return a thunk that hides 10000000 pending applications of L.cons.

When the runtime tries to "pop" that thunk, it will put each n variable in the stack while the thunk below is being "popped" in its turn, causing the stack overflow. "Function calls don't add stack frames in Haskell; instead, stack frames come from nesting thunks."

The solution is to force each intermediate tree to WHNF, so that thunks don't accumulate. This should be enough (using the BangPatterns extension):

l :: Tree Int 
l = go 10000000 L.empty
  where
    go 0 !t = L.cons 0 t
    go n !t = go (n - 1) (L.cons n t)

This basically means: "before recursing to add another element, make sure the accumulator is in WHNF". The n need not be forced because it is scrutinized in the pattern-match.

danidiaz
  • 26,936
  • 4
  • 45
  • 95
  • On my machine the above runs with the following: 14,786,460,656 bytes allocated in the heap 2,669,788,232 bytes copied during GC 598,027,032 bytes maximum residency (12 sample(s)) 1,447,144 bytes maximum slop 1169 MB total memory in use (0 MB lost due to fragmentation) Is this normal or using excess allocation? No additional compiler flags passed. – Abhiroop Sarkar Jan 01 '19 at 15:14
  • @AbhiroopSarkar (598027032 bytes of maximum residence / 10000000) gives around 60 bytes per node, which actually seems low to me, given how the structure is defined. (As a lower bound to memory consumption, we have to store one Int for each element, and also a pointer to it...) – danidiaz Jan 01 '19 at 15:31