explain the Haskell breadth first numbering code to traverse trees

Question

I am reading this paper by Chris Okasaki; titled "Breadth-First Numbering: Lessons from a Small Exercise in Algorithm Design".

A question is - how is the magic happening in the algorithm? There are some figures (e.g. figure 7 titled "threading the output of one level into the input of next level") Unfortunately, maybe it's only me, but that figure has completely baffled me. I don't understand how the threading happens at all?

(Most of the code in that paper (including the snippet you posted) is in SML.) — Dogbert, Apr 19 '15 at 06:50
There's a Haskell implementation of that algorithm for [`unfoldForestM_BF` in `Data.Tree`](http://hackage.haskell.org/package/containers-0.2.0.1/docs/src/Data-Tree.html#unfoldTreeM_BF). — Cirdec, Apr 20 '15 at 04:56
By the way, you should also look into the approach Okasaki contrasts his with—what I think he called the level-oriented solution. It's quite nice as well, and I suspect faster. — dfeuer, Apr 20 '15 at 18:51
@Cirdec, you may find some things in my answer oddly familiar :-) — dfeuer, Apr 21 '15 at 02:23

score 3 · Accepted Answer · answered Apr 20 '15 at 18:45

3

Breadth first traversal means traversing levels of a tree one by one. So let's assume we already know what are the numbers at the beginning of each level - the number of traversed elements so far before each level. For the simple example in the paper

import Data.Monoid

data Tree a = Tree (Tree a) a (Tree a)
            | Empty
  deriving (Show)

example :: Tree Char
example = Tree (Tree Empty 'b' (Tree Empty 'c' Empty)) 'a' (Tree Empty 'd' Empty)

the sizes would be 0, 1, 3, 4. Knowing this, we can thread such a list of sizes through a give tree (sub-tree) left-to-right: We advance the first element of the list by one for the node, and thread the tail of the list first through the left and then through the right subtree (see thread below).

After doing so, we'll get again the same list of sizes, only shifted by one - now we have the total number of elements after each level. So the trick is: Assume we have such a list, use it for the computation, and then feed the output as the input - tie the knot.

A sample implementation:

tagBfs :: (Monoid m) => (a -> m) -> Tree a -> Tree m
tagBfs f t = let (ms, r) = thread (mempty : ms) t
              in r
  where
    thread ms Empty = (ms, Empty)
    thread (m : ms) (Tree l x r) =
        let (ms1, l') = thread ms l
            (ms2, r') = thread ms1 r
         in ((m <> f x) : ms2, Tree l' m r')

generalized to Monoid (for numbering you'd give const $ Sum 1 as the function).

answered Apr 20 '15 at 18:45

Petr

62,528
13
153
317

These days, wouldn't one want to write a traversal function for arbitrary applicatives? – dfeuer Apr 20 '15 at 19:38
@dfeuer One would, but the problem is we need to traverse level by level, yet get the sub-nodes from the level below (`l'` and `r'`). So I'm not sure if it's possible for a general aplicative. – Petr Apr 20 '15 at 19:40
See my answer. I'm pretty sure I got it. I defined a `Traversable` instance based on the `unfoldForestBF` concept. – dfeuer Apr 21 '15 at 02:21
@PetrPudlák thanks, it's amazing; I am trying to wrap my head around it. still looks like magic. can you explain how the `ms` is defined in the line `tagBfs f t = let (ms, r) = thread (mempty : ms) t`? it looks infinite recursion. – mntk123 Apr 21 '15 at 14:25
@mntk123 It's very similar to defining the Fibonacci sequence as an infinite list [`fibs = 0 : 1 : zipWith (+) fibs (tail fibs)`](https://wiki.haskell.org/The_Fibonacci_sequence#Canonical_zipWith_implementation). Since `thread` processes the list one by one (level by level), it only needs the head `mempty` to process the first level. For computing the second level, it uses the result computed in the first one etc. The list is potentially infinite, but this is fine, as we only need as many elements of the list as to which depth we inspect the tree. – Petr Apr 21 '15 at 15:29
@PetrPudlák okay, got it; but then will it work with any finite list in place of `ms`, (assuming the list is at least of length equal to the number of elements in the tree)? in other words, is lazy evaluation needed here? – mntk123 Apr 22 '15 at 07:08
@mntk123 It's not about 'any' list, the list is defined by the equation. Yes, lazy evaluation is needed here - the list referenced while being computed. You could also use finite lists (assuming the depth of the tree is finite), perhaps by having it as a list of `Maybe`s where `Nothing` means no nodes traversed on a particular level, and end the list then. But in Haskell you don't need it and it'd only make the code more complex with no real gain. – Petr Apr 23 '15 at 07:51
@dfeuer IIUC, the difference is that your solution splits levels into separate temporary data structures, traverses them and merges back, right? I wonder if it's be possible to merge the approaches together somehow. – Petr Apr 23 '15 at 09:33
@PetrPudlák, It does that, yes, and I don't *think* it's possible to avoid that without significant damage. Cirdec did a lot of good and interesting work on such things in the various rose tree monadic unfold attempts, and may therefore have a deeper understanding. – dfeuer Apr 23 '15 at 19:46

score 3 · Answer 2 · edited May 23 '17 at 12:09

One way to view tree numbering is in terms of a traversal. Specifically, we want to traverse the tree in breadth-first order using State to count up. The necessary Traversable instance looks something like this. Note that you'd probably actually want to define this instance for a newtype like BFTree, but I'm just using the raw Tree type for simplicity. This code is strongly inspired by ideas in Cirdec's monadic rose tree unfolding code, but the situation here seems to be substantially simpler. Hopefully I haven't missed something horrible.

{-# LANGUAGE DeriveFunctor,
             GeneralizedNewtypeDeriving,
             LambdaCase #-}
{-# OPTIONS_GHC -Wall #-}

module BFT where

import Control.Applicative
import Data.Foldable
import Data.Traversable
import Prelude hiding (foldr)

data Tree a = Tree (Tree a) a (Tree a)
            | Empty
  deriving (Show, Functor)

newtype Forest a = Forest {getForest :: [Tree a]}
   deriving (Functor)

instance Foldable Forest where
  foldMap = foldMapDefault

-- Given a forest, produce the forest consisting
-- of the children of the root nodes of non-empty
-- trees.
children :: Forest a -> Forest a
children (Forest xs) = Forest $ foldr go [] xs
  where
    go Empty c = c
    go (Tree l _a r) c = l : r : c

-- Given a forest, produce a list of the root nodes
-- of the elements, with `Nothing` values in place of
-- empty trees.
parents :: Forest a -> [Maybe a]
parents (Forest xs) = foldr go [] xs
  where
    go Empty c = Nothing : c
    go (Tree _l a _r) c = Just a : c

-- Given a list of values (mixed with blanks) and
-- a list of trees, attach the values to pairs of
-- trees to build trees; turn the blanks into `Empty`
-- trees.
zipForest :: [Maybe a] -> Forest a -> [Tree a]
zipForest [] _ts = []
zipForest (Nothing : ps) ts = Empty : zipForest ps ts
zipForest (Just p : ps) (Forest ~(t1 : ~(t2 : ts'))) =
   Tree t1 p t2 : zipForest ps (Forest ts')

instance Traversable Forest where
  -- Traversing an empty container always gets you
  -- an empty one.
  traverse _f (Forest []) = pure (Forest [])

  -- First, traverse the parents. The `traverse.traverse`
  -- gets us into the `Maybe`s. Then traverse the
  -- children. Finally, zip them together, and turn the
  -- result into a `Forest`. If the `Applicative` in play
  -- is lazy enough, like lazy `State`, I believe 
  -- we avoid the double traversal Okasaki mentions as
  -- a problem for strict implementations.
  traverse f xs = (Forest .) . zipForest <$>
          (traverse.traverse) f (parents xs) <*>
          traverse f (children xs)

instance Foldable Tree where
  foldMap = foldMapDefault

instance Traversable Tree where
  traverse f t =
       (\case {(Forest [r]) -> r;
               _ -> error "Whoops!"}) <$>
       traverse f (Forest [t])

Now we can write code to pair up each element of the tree with its breadth-first number like this:

import Control.Monad.Trans.State.Lazy

numberTree :: Tree a -> Tree (Int, a)
numberTree tr = flip evalState 1 $ for tr $ \x ->
      do
        v <- get
        put $! (v+1)
        return (v,x)

explain the Haskell breadth first numbering code to traverse trees

2 Answers2

Linked