Reading a tree from file in Haskell

Question

I want to create a tree from a file in Haskell. For that, i read the file into this list of lists:

The names in each element of the list follow this pattern:

["Name","Dad","Mum"]

[["Bob","Dylan","Susan"],
 ["Dylan","Cole","Sarah"],
 ["Cole","Patrick","Patricia"],
 ["Sarah","David","Fiona"],
 ["Susan","Michael","Madeline"]]

The desired output is something like:

Bob
      Dylan
            Cole
                  Patrick
                  Patricia
            Sarah
                  David
                  Fiona
      Susan
            Michael
            Madeline

The spaces could be a tab, i just put more to emphasise my point.

Here's what i've managed to do so far:

data Tree a = EmptyTree | Node a (Tree a) (Tree a) deriving (Show, Read, Eq) 

singleton :: a -> Tree a  
singleton x = Node x EmptyTree EmptyTree  

treeInsert :: (Ord a) => a -> Tree a -> Tree a  
treeInsert x EmptyTree = singleton x  
treeInsert x (Node a left right)   
    | x == a = Node x left right  
    | x < a  = Node a (treeInsert x left) right  
    | x > a  = Node a left (treeInsert x right)  

createTree :: (Ord a) => [a] -> Tree a
createTree [] = EmptyTree
createTree (x:xs) = createTree2 (Node x EmptyTree EmptyTree) xs
  where
    createTree2 tree [] = tree
    createTree2 tree (y:ys) = createTree2 (treeInsert y tree) ys

printTree :: Show a => Tree a -> IO ()
printTree = (mapM_ putStrLn) . treeIndent
  where
    treeIndent EmptyTree          = ["\nEmpty Tree\n"]
    treeIndent (Node v lb rb) =
      [(show v)] ++
      map ("      " ++) ls ++
      ("" ++ r) : map ("   " ++) rs
    where
        (r:rs) = treeIndent $ rb
        ls     = treeIndent $ lb

All this lets me, very basically create the tree, and print it to the screen. What i'm struggling with is the proper creating of the tree according to this problem.

Just a side note: for real-world genealogies, you might not want to [enforce tree-ness so carefully](http://stackoverflow.com/q/6163683/791604). — Daniel Wagner, Nov 30 '14 at 20:44

score 1 · Answer 1 · answered Nov 30 '14 at 20:19

It may be simpler to think about the more general version of the problem. Namely, consider a list of type [(a, Maybe a, Maybe a)]. You can build a tree (more specifically, a list of trees) from this list by taking each first element to be a node, and the 2nd and 3rd elements correspond to the branches - if they are Nothing, the branch is Nil. Otherwise, the semantics of this function correspond exactly to the one you want to write.

First, write a helper function for encoding this logic:

lookupDef :: Eq a => Maybe a -> [(a, Tree a)] -> Tree a
lookupDef Nothing   _ = Nil 
lookupDef (Just a) xs | Just r <- lookup a xs = r 
                      | otherwise             = Node a Nil Nil

The second arguement is the existing list of (key,value) pairs corresponding to the trees for the rest of the names. Then, if the value to lookup is not nothing but not one of the keys, it is a "terminal" value, so simply return it by itself in a tree.

Then, an intermediate function which constructs the aforementioned list of (key,value) pairs.

readTreeList :: Eq a => [(a, Maybe a, Maybe a)] -> [(a, Tree a)]
readTreeList [] = []
readTreeList xs@(_:_) = result where
 result = [ (p, Node p ? ?) | (p, l, r) <- xs ]

The above should be evident: every key in the input list must correspond to a value in the output. The tree for that key will be a Node p q r, where q/r are the trees corresponding to l/r. The first function above will compute q and r. This is where the lookupDef function comes in:

 result = [ (p, Node p (lookupDef l ?) (lookupDef r ?)) | (p, l, r) <- xs ]

But what is the list in which to lookup subtrees? The only such list we have is result, so lets try that:

 result = [ (p, Node p (lookupDef l result) (lookupDef r result)) 
          | (p, l, r) <- xs ]

By the magic of laziness, this will actually work.

Then to get a single tree out of this, simply take the first element of the result (your sample input indicates the first element should be used as the root). In fact, you can inline this part with the above:

readTree :: Eq a => [(a, Maybe a, Maybe a)] -> Tree a 
readTree [] = Nil 
readTree xs@(_:_) = snd h where 
  result@(h:_) = [ (p, Node p (lookupDef l result) (lookupDef r result)) 
                 | (p, l, r) <- xs ]

Then your data:

test = map (\([x,y,z]) -> (x, Just y, Just z))
  [["Dylan","Cole","Sarah"],
   ["Sarah","David","Fiona"],
   ["Bob","Dylan","Susan"],
   ["Cole","Patrick","Patricia"],
   ["Susan","Michael","Madeline"]]

And the result:

>printTree $ readTree test
"Bob"
  |"Dylan"
  |  |"Cole"
  |  |  |"Patrick"
  |  |  |"Patricia"
  |  |"Sarah"
  |  |  |"David"
  |  |  |"Fiona"
  |"Susan"
  |  |"Michael"
  |  |"Madeline"

This would certainly be faster with a datatype other than a list storing the key value pairs (Data.Map) but that is a different question.

Note that I modified/added to the definition slightly, but this isn't relevant to the above code:

{-# LANGUAGE DeriveFunctor, DeriveFoldable #-}

import qualified Data.Foldable as F

data Tree a = Nil | Node a (Tree a) (Tree a) 
  deriving (Show, Read, Eq, Functor, F.Foldable)

This gives you fmap and toList.

formatTree Nil = Nil
formatTree (Node a l r) = Node (show a) 
                               (fmap ("  |" ++) $ formatTree l)
                               (fmap ("  |" ++) $ formatTree r)

printTree x = putStrLn . unlines . F.toList . formatTree $ x

This gives you a simpler pretty print function.

just out of curiosity, how would you make this faster with datatype? — , Nov 30 '14 at 20:48
@dcarou using `List` for key-value lookup is O(n) because the lookup must examine each member of the list to see if the key matches. Other data structures, like tree maps (as in `Data.Map`) and hash tables, can reduce that running time to O(log n) or O(1), respectively. — mdunsmuir, Dec 01 '14 at 04:18

mdunsmuir · Accepted Answer · 2014-12-01T04:09:57.743

If I understand correctly, you're having issues with two parts of this problem: creating the tree, and printing it in the desired style. I'll address each of these:

Creating the Tree

The wrinkle with this problem is that the input data comes in the form of what I'm going to call an association list, which associates each parent node with two child nodes. This list puts constraints on how you can build your tree, but it may not be immediately obvious how to proceed according to those constraints (and it's interesting to note that they don't specify a unique tree). I wrote this function to do it, using your Tree type:

data Tree a = EmptyTree | Node a (Tree a) (Tree a)

toTree :: [[String]] -> Tree String
toTree list = toTree' root
  where
    -- both these are extremely unsafe, as they assume that the input is a list
    -- of lists each with length three
    root = fst $ head mapping
    mapping :: [(String, (String, String))]
    mapping = fmap (\(p:c1:c2:[]) -> (p, (c1, c2))) list

    -- Recursively build our tree, using the association list defined above to
    -- look up the children for each node. If there are no children, we return
    -- a node with EmptyTree children instead.
    toTree' root = let childs = lookup root mapping
                   in  maybe (Node root EmptyTree EmptyTree)
                             (\(l, r) -> Node root (toTree' l) (toTree' r))
                             childs

This function turns your list inputs into a [(String, (String, String))], named mapping. Using the lookup function for Lists, we can use mapping as an association list and search for the children (String, String) associated with a parent String.

Then, we recursively build our tree using the toTree' function. At each node, it performs a lookup into the mapping association list for the children of that node. If there are children, it recursively adds them to the tree. Building the tree in this way means that the input tuples could have been in any order. The use of the lookup function for Lists here is quite inefficient, and Data.Map could be used instead if performance was a concern.

Printing the Tree

Your approach uses recursion, which is probably the easiest way to do this, but you still try to collect a list of all the output and then mapM over it at the end. I think it's easier to simply output the node contents as you traverse the tree, unless there's some reason not to (you could use a Writer monad instead of IO if you wanted to collect the output).

My approach uses an Int counter to keep track of the indent level:

printTree :: Tree String -> IO ()
printTree t = printTree' t 0
  where
    -- if we reached the bottom of the tree, do nothing
    printTree' EmptyTree _ = return ()

    -- We first print the current node's string value, and then recursively
    -- call ourselves for the children. This is a simple depth-first tree
    -- traversal, for which binary trees are well-suited.
    printTree' (Node s l r) depth = do
      putStrLn $ replicate depth ' ' ++ s
      printTree' l (depth + 2)
      printTree' r (depth + 2)

The output is nicely formatted:

Bob
  Dylan
    Cole
      Patrick
      Patricia
    Sarah
      David
      Fiona
  Susan
    Michael
    Madeline

An Alternative

I suspect that this is a homework problem or similar, which may make the use of binary trees non-negotiable, but it's pretty easy here to do a depth-first traversal of the adjacency list without ever turning it into a binary tree (the algorithm looks pretty similar):

simpleTreePrint :: [[String]] -> IO ()
simpleTreePrint list = p' (fst $ head mapping) 0
  where
    -- this recursive function prints the 'root' name (eg "Bob") that it is
    -- called with, then recursively calls itself for all the children of
    -- that name that it finds in the 'mapping' data structure
    p' :: String -> Int -> IO ()
    p' root depth = let children = maybe [] id $ lookup root mapping
                    in  do putStrLn $ replicate depth ' ' ++ root
                           forM_ children $ \c -> p' c (depth + 2)

    -- to make child lookups easier, we convert the original list of lists
    -- of names into tuples whose first values are the 'parent' name, and
    -- whose second values are the remaining names. This allows us to use the
    -- regular List lookup function, which is not efficient but may suffice
    -- for this application
    mapping :: [(String, [String])]
    mapping = fmap (\(p:cs) -> (p, cs)) list

This approach treats your input data set more like a generalized graph. It can handle parents with more than two children, and with a more complex input data set we could leverage the graph approach to do even more cool stuff. Our simpleTreePrint function would probably break, though, as it will only really work when the input data are strictly a tree.

score 0 · Answer 3 · answered Nov 30 '14 at 18:52

Let's give a type to the list of child-parent associations:

type Parents = [ [String] ]

theParents :: Parents
theParents = [ ["Bob","Dylan","Susan"], ["Dylan","Cole","Sarah"], ... ]

You first have to write a function which looks up data in this list:

lookupParents :: Parents -> String -> (Maybe String, Maybe String)
lookupParents pars name = ...???...

e.g.:

lookupParents theParents "Bob" = (Just "Dylan", Just "Susan")
lookupParents theParents "nobody" = (Nothing, Nothing)

Next, your buildTree function will look like this:

buildTree :: Parents -> String -> Tree String
buildTree parents rootName = Node rootName leftTree rightTree
  where (mleft, mright) = lookupParents parents rootName
        leftTree = ... some function of mleft ...
        rightTree = ... some function of mright ...

but how do i search the names ? because there isn't any specific order, for example, Susan's parents appear in the last line, and yet Susan appears on the first... do i need to store that data first? — , Nov 30 '14 at 19:01
The lookup function tries to find an element `[child, dad, mom]` of `theParents` list where `child` equals the target name. Does this help? Study this expression: `[ family | family <- theParents, head family == "Dylan" ]` — ErikR, Nov 30 '14 at 19:20

Reading a tree from file in Haskell

3 Answers3