Function to Populate Tree in O(depth)

Question

Purely Functional Data Structures has the following exercise:

-- 2.5 Sharing can be useful within a single object, not just between objects.
-- For example, if the two subtress of a given node are identical, then they can 
-- be represented by the same tree.
-- Part a: make a `complete a Int` function that creates a tree of 
-- depth Int, putting a in every leaf of the tree.
complete :: a -> Integer -> Maybe (Tree a)
complete x depth 
 | depth < 0  = Nothing
 | otherwise  = Just $ complete' depth
                        where complete' d 
                                | d == 0    = Empty
                                | otherwise = let copiedTree = complete' (d-1) 
                                              in Node x copiedTree copiedTree

Does this implementation run in O(d) time? Could you please say why or why not?

if the *depth* is 1, then the following executes: `complete 1` -> `complete' 1` -> `complete' 0`, which returns `Empty` - so I'm not sure. My confusion here is understanding what counts towards the `O(d)` count. — Kevin Meredith, Feb 25 '15 at 03:01
Mostly, you should count applying constructors, matching on constructors, and performing arithmetic. Note that guards and `if` are essentially pattern matches on `Bool`, so they count too. — dfeuer, Feb 25 '15 at 03:17
so calling `complete _ 1` will do the following operations: (1) `depth < 0` (2) `d == 0` (3) calling `complete' (d-1) -- which includes a few operations and (4) construction of `Node x copiedTree copiedTree`. So mine definitely isn't `O(d)`. I've got *at least* `O(6/7/8)`, as I perceive. — Kevin Meredith, Feb 25 '15 at 03:56
Kevin, I don't know what O(6/7/8) is supposed to mean. If you want to do this analysis for real, you need to actually write out the equations and then work through the math. If you add the equations to your question, that will definitely get you started in the right direction. — dfeuer, Feb 25 '15 at 04:54

score 4 · Answer 1 · edited May 23 '17 at 10:24

The interesting part of the code is the complete' function:

complete' d 
  | d == 0    = Empty
  | otherwise = let copiedTree = complete' (d-1) 
                in Node x copiedTree copiedTree

As Cirdec's answer suggests, we should be careful to analyze each part of the implementation to make sure our assumptions are valid. As a general rule, we can assume that the following take 1 unit of time each*:

Using a data constructor to construct a value (e.g., using Empty to make an empty tree or using Node to turn a value and two trees into a tree).
Pattern matching on a value to see what data constructor it was built from and what values the data constructor was applied to.
Guards and if/then/else expressions (which are implemented internally using pattern matching).
Comparing an Integer to 0.

Cirdec mentions that the operation of subtracting 1 from an Integer is logarithmic in the size of the integer. As they say, this is essentially an artifact of the way Integer is implemented. It is possible to implement integers so that it takes only one step to compare them to 0 and also takes only one step to decrement them by 1. To keep things very general, it's safe to assume that there is some function c such that the cost of decrementing an Integer is c(depth).

Now that we've taken care of those preliminaries, let's get down to work! As is generally the case, we need to set up a system of equations and solve it. Let f(d) be the number of steps needed to calculate complete' d. Then the first equation is very simple:

f(0) = 2

That's because it costs 1 step to compare d to 0, and another step to check that the result is True.

The other equation is the interesting part. Think about what happens when d > 0:

We calculate d == 0.
We check if that is True (it's not).
We calculate d-1 (let's call the result dm1)
We calculate complete' dm1, saving the result as copiedTree.
We apply a Node constructor to x, copiedTree, and copiedTree.

The first part takes 1 step. The second part takes one step. The third part takes c(depth) steps, and the fifth step takes 1 step. What about the fourth part? Well, that takes f(d-1) steps, so this will be a recursive definition.

f(0) = 2
f(d) = (3+c(depth)) + f(d-1)    when d > 0

OK, now we're cooking with gas! Let's calculate the first few values of f:

f(0) = 2
f(1) = (3+c(depth)) + f(0) = (3+c(depth)) + 2
f(2) = (3+c(depth)) + f(1)
     = (3+c(depth)) + ((3+c(depth)) + 2)
     = 2*(3+c(depth)) + 2
f(3) = (3+c(depth)) + f(2)
     = (3+c(depth)) + (2*(3+c(depth)) + 2)
     = 3*(3+c(depth)) + 2

You should be starting to see a pattern by now:

f(d) = d*(3+c(depth)) + 2

We generally prove things about recursive functions using mathematical induction.

Base case:

The claim holds for d=0 because 0*(3+c(depth))+2=0+2=2=f(0).

Suppose that the claim holds for d=D. Then

f(D+1) = (3+c(depth)) + f(D)
       = (3+c(depth)) + (D*(3+c(depth))+2)
       = (D+1)*(3+c(depth))+2

So the claim holds for D+1 as well. Thus by induction, it holds for all natural numbers d. As a reminder, this gives the conclusion that complete' d takes

f(d) = d*(3+c(depth))+2

time. Now how do we express that in big O terms? Well, big O doesn't care about the constant coefficients of any of the terms, and only cares about the highest-order terms. We can safely assume that c(depth)>=1, so we get

f(d) ∈ O(d*c(depth))

Zooming out to complete, this looks like O(depth*c(depth))

If you use the real cost of Integer decrement, this gives you O(depth*log(depth)). If you pretend that Integer decrement is O(1), this gives you O(depth).

Side note: As you continue to work through Okasaki, you will eventually reach section 10.2.1, where you will see a way to implement natural numbers supporting O(1) decrement and O(1) addition (but not efficient subtraction).

* Haskell's lazy evaluation keeps this from being precisely true, but if you pretend that everything is evaluated strictly, you will get an upper bound for the true value, which will be good enough in this case. If you want to learn how to analyze data structures that use laziness to get good asymptotic bounds, you should keep reading Okasaki.

Thanks for this detailed answer. But, why doesn't `f(0) == 3`: `(1) depth < 0`, `(2) d == 0` and `(3) Empty`? — Kevin Meredith, Feb 26 '15 at 03:03
Also, am I correct in that my solution is **not** `O(d)` - it requires using the `O(1)` technique that you described? — Kevin Meredith, Feb 26 '15 at 03:08
@KevinMeredith, it kind of depends how you look at it. In theory, it's not O(d) because `Integer` decrement is not. But (1) you could "fix" that using a different implementation of natural numbers, and (2) for all practical purposes it might as well be O(d), because it's O(d log d) with a tiny constant factor. — dfeuer, Feb 26 '15 at 03:15
@KevinMeredith, you should think about what makes that constant factor so tiny in this case. — dfeuer, Feb 26 '15 at 03:17
I would upvote this answer just for properly using `∈` instead of `=` for containment in an `O` function class. — Cactus, Feb 26 '15 at 04:03

Cirdec · Answer 2 · 2015-02-25T15:35:08.927

Theoretical Answer

No, it does not run in O(d) time. Its asymptotic performance is dominated by the the Integer subtraction d-1, which takes O(log d) time. This is repeated O(d) times, giving an asymptotic upper bound on time of O(d log d).

This upper bound can improve if you use an Integer representation with an asymptotically optimal O(1) decrement. In practice we don't, since the asymptotically optimal Integer implementations are slower even for unimaginably large values.

Practically the Integer arithmetic will be a small part of the running time of the program. For practical "large" depths (smaller than a machine word) the program's running time will be dominated by allocating and populating memory. For larger depths you will exhaust the resources of the computer.

Practical Answer

Ask the run time system's profiler.

In order to profile your code, we first need to make sure it is run. Haskell is lazily evaluated, so, unless we do something to cause the tree to be completely evaluated, it might not be. Unfortunately, completely exploring the tree will take O(2^d) steps. We could avoid forcing nodes we had already visited if we kept track of their StableNames. Fortunately, traversing a structure and keeping track of visited nodes by their memory locations is already provided by the data-reify package. Since we will be using it for profiling, we need to install it with profiling enabled (-p).

cabal install -p data-reify

Using Data.Reify requires the TypeFamilies extension and Control.Applicative.

{-# LANGUAGE TypeFamilies #-}

import Data.Reify
import Control.Applicative

We reproduce your Tree code.

data Tree a = Empty | Node a (Tree a) (Tree a)

complete :: a -> Integer -> Maybe (Tree a)
complete x depth 
 | depth < 0  = Nothing
 | otherwise  = Just $ complete' depth
                        where complete' d 
                                | d == 0    = Empty
                                | otherwise = let copiedTree = complete' (d-1) 
                                              in Node x copiedTree copiedTree

Converting data to a graph with data-reify requires that we have a base functor for the data type. The base functor is a representation of the type with explicit recursion removed. The base functor for Tree is TreeF. An additional type parameter is added for the representation of recursive occurrence of the type, and each recursive occurrence is replaced by the new parameter.

data TreeF a x = EmptyF | NodeF a x x
    deriving (Show)

The MuRef instance required by reifyGraph requires that we provide a mapDeRef to traverse the structure with an Applicative and convert it to the base functor . The first argument provided to mapDeRef, which I have named deRef, is how we can convert the recursive occurrences of the structure.

instance MuRef (Tree a) where
    type DeRef (Tree a) = TreeF a
    mapDeRef deRef Empty        = pure EmptyF
    mapDeRef deRef (Node a l r) = NodeF a <$> deRef l <*> deRef r

We can make a little program to run to test the complete function. When the graph is small, we'll print it out to see what's going on. When the graph gets big, we'll only print out how many nodes it has.

main = do
    d <- getLine
    let (Just tree) = complete 0 (read d)
    graph@(Graph nodes _) <- reifyGraph tree
    if length nodes < 30 
    then print graph
    else print (length nodes)

I put this code in a file named profileSymmetricTree.hs. To compile it, we need to enable profiling with -prof and enable the run-time system with -rtsopts.

ghc -fforce-recomp -O2 -prof -fprof-auto -rtsopts profileSymmetricTree.hs

When we run it, we'll enable the time profile with the +RTS option -p. We'll give it the depth input 3 for the first run.

profileSymmetricTree +RTS -p
3
let [(1,NodeF 0 2 2),(2,NodeF 0 3 3),(3,NodeF 0 4 4),(4,EmptyF)] in 1

We can already see from the graph that the nodes are being shared between the left and right sides of the tree.

The profiler makes a file, profileSymmetricTree.prof.

                                                                                individual     inherited
COST CENTRE                        MODULE                     no.     entries  %time %alloc   %time %alloc

MAIN                               MAIN                        43           0    0.0    0.7   100.0  100.0
 main                              Main                        87           0  100.0   21.6   100.0   32.5
  ...
  main.(...)                       Main                        88           1    0.0    4.8     0.0    5.1
   complete                        Main                        90           1    0.0    0.0     0.0    0.3
    complete.complete'             Main                        92           4    0.0    0.2     0.0    0.3
     complete.complete'.copiedTree Main                        94           3    0.0    0.1     0.0    0.1

It shows in the entries column that complete.complete' was executed 4 times, and the complete.complete'.copiedTree was evaluated 3 times.

If you repeat this experiment with different depths, and plot the results, you should get a good idea what the practical asymptotic performance of complete is.

Here are the profiling results for a much greater depth, 300000.

                                                                                individual     inherited
COST CENTRE                        MODULE                     no.     entries  %time %alloc   %time %alloc

MAIN                               MAIN                        43           0    0.0    0.0   100.0  100.0
 main                              Main                        87           0    2.0    0.0    99.9  100.0
  ...
  main.(...)                       Main                        88           1    0.0    0.0     2.1    5.6
   complete                        Main                        90           1    0.0    0.0     2.1    5.6
    complete.complete'             Main                        92      300001    1.3    4.4     2.1    5.6
     complete.complete'.copiedTree Main                        94      300000    0.8    1.3     0.8    1.3

Well gosh if you're going to count `d-1` as log(d), we might as well realize that subtracting 1 from an integer has amortized constant time, since the nth bit only needs to be touched once every 2^n times. — luqui, Feb 25 '15 at 07:53
@luqui That's true if you represent `Integer` as a list of digits with the least significant digit first or have mutable `Integer`s. GHC's [default](https://ghc.haskell.org/trac/ghc/wiki/Commentary/Libraries/Integer) `Integer` representation [is a `ByteArray#`](https://hackage.haskell.org/package/integer-gmp-0.5.1.0/docs/GHC-Integer-GMP-Internals.html). Subtracting `1` requires copying the entire `ByteArray#`, which is `O(n)` where `n` is the number of bits in the array. — Cirdec, Feb 25 '15 at 15:10
You might want to mention that the practical limit on the `Integer` size in this particular program is imposed by theoretical physical limits. If you had a 256-bit depth, say, you couldn't ever reach a leaf because the universe [would have ended by then](https://en.wikipedia.org/wiki/Ultimate_fate_of_the_universe). But my bigger concern is that your "practical" answer is a bit too fancy to help the OP, and the theoretical answer is incomplete. — dfeuer, Feb 25 '15 at 15:57
It would probably be useful to give a theoretical answer based on an abstract machine with O(1) arithmetic. — dfeuer, Feb 25 '15 at 15:58
@dfeuer The theoretical answer it deliberately incomplete for a machine with `O(1)` arithmetic to encourage the reader to either think, "`O(log d)` and `O(d)` are to `O(d log d)` as `O(1)` and `O(d)` are to ???" or to explore the practical answer. The practical answer is as simple as possible to avoid waiting until the universe would end *a thousand times* to force a fairly small structure like `complete 0 300000`. — Cirdec, Feb 25 '15 at 16:18
@Cirdec, I understand the value of the practical answer you give, and also why it is harder to construct than one might initially expect. But since it looks quite hard to an intermediate-level Haskeller like myself, I imagine it would go completely over the head of a beginner. But then again, different people do learn things differently, so maybe I should just shut up and write my own answer already. — dfeuer, Feb 25 '15 at 16:28
@dfeuer A *determined* novice reader should be able to deal with the practical answer, "We're going to run the question code ... here's the question code ... here's some code the author claims will run it ... here are some results." There are two interesting results. With a input depth of `3` they were `4` and `3`. With an input depth of `300000` they were `300001` and `300000`. The reader is left to draw all of their own conclusions. — Cirdec, Feb 25 '15 at 16:28
Thanks for this detailed answer. On your last sentence, `With a input depth of 3 and there 4 and 3...`. How would I use those data points to come up with an `O(n)` measurement? — Kevin Meredith, Feb 26 '15 at 03:24
@KevinMeredith First, you need to reason out that one execution of `complete'` (ignoring recursion) takes at most some *constant* amount of time `c` (if we ignore the integer decrement). The problem then is to figure out how many times `complete'` is being evaluated. From the two measurements, we can see it's being evaluated exactly `d + 1` times. The overall running time is then `(d+1)*c` which is in `O(d)`. We can see that nothing inside `complete'` (like `copiedTree`) is being evaluated more than proportionally to `complete'`. — Cirdec, Feb 26 '15 at 03:32
@KevinMeredith If it weren't an easy to tease out linear relationship, you'd run it many times, plot the results, and hypothesize what an upper bound is, then check the hypothesis with larger and larger inputs. — Cirdec, Feb 26 '15 at 03:35

Function to Populate Tree in O(depth)

2 Answers2

Theoretical Answer

Practical Answer

Linked