Gather data about existing tree-like data

Question

Let's say we have existing tree-like data and we would like to add information about depth of each node. How can we easily achieve that?

Data Tree = Node Tree Tree | Leaf

For each node we would like to know in constant complexity how deep it is. We have the data from external module, so we have information as it is shown above. Real-life example would be external HTML parser which just provides the XML tree and we would like to gather data e.g. how many hyperlinks every node contains.

Functional languages are created for traversing trees and gathering data, there should be an easy solution.

Obvious solution would be creating parallel structure. Can we do better?

... that's not the point of functional languages. Also, the data structure you have defined isn't terribly useful since it only keeps track of the shape of the tree, not the data in it. — Bakuriu, Nov 09 '15 at 19:58
Well you can imagine that there is also big data contained within, that's just simplification. — JKS, Nov 09 '15 at 20:50
I disagree with the close votes. There's a sizable body of knowledge regarding tree annotations in Haskell, which an answer could meaningfully cover. — András Kovács, Nov 09 '15 at 21:02

Daniel Wagner · Answer 1 · 2015-11-09T21:42:14.380

4

The standard trick, which I learned from Chris Okasaki's wonderful Purely Functional Data Structures is to cache the results of expensive operations at each node. (Perhaps this trick was known before Okasaki's thesis; I don't know.) You can provide smart constructors to manage this information for you so that constructing the tree need not be painful. For example, when the expensive operation is depth, you might write:

module SizedTree (SizedTree, sizedTree, node, leaf, depth) where

data SizedTree = Node !Int SizedTree SizedTree | Leaf

node l r = Node (max (depth l) (depth r) + 1) l r
leaf = Leaf

depth (Node d _ _) = d
depth Leaf = 0

-- since we don't expose the constructors, we should
-- provide a replacement for pattern matching
sizedTree f v (Node _ l r) = f l r
sizedTree f v Leaf = v

Constructing SizedTrees costs O(1) extra work at each node (hence it is O(n) work to convert an n-node Tree to a SizedTree), but the payoff is that checking the depth of a SizedTree -- or of any subtree -- is an O(1) operation.

edited Nov 09 '15 at 21:42

answered Nov 09 '15 at 21:39

Daniel Wagner

145,880
9
220
380

1

Nice solution. Do you want `Node !Int ...`? – Rein Henrichs Nov 09 '15 at 21:41
@ReinHenrichs That's probably a good idea. Consider it stolen! – Daniel Wagner Nov 09 '15 at 21:43
You can't steal something that is freely given. :) – Rein Henrichs Nov 09 '15 at 21:43
Unless I don't understand something, I don't see how this is relevant for the answer. It doesn't really provide a way to extract/morph data from existing structure. – JKS Nov 09 '15 at 22:08
1

@JKS It is easy to convert from your existing structure to this structure. `convert :: Tree -> SizedTree; convert Leaf = leaf; convert (Node l r) = node l r`. – Daniel Wagner Nov 09 '15 at 22:51
Well, the problem is that parallel structures aren't quite maintainable. If source changes, then all source code needs refactoring. Also, I can't use functions already written for traversals of old structure. What I'm looking for is just making attachment to existing thing. I'm not sure if this is asking for impossible, but it's like I need mapping all nodes to new attribute. – JKS Nov 10 '15 at 00:56

score 2 · Answer 2 · edited May 23 '17 at 12:29

You do need some another data where you can store these Ints. Define Tree as

data Tree a = Node Tree a Tree | Leaf a

and then write a function

annDepth :: Tree a -> Tree (Int, a)

Your original Tree is Tree () and with pattern synonyms you can recover nice constructors.

If you want to preserve the original tree for some reason, you can define a view:

{-# LANGUAGE GADTs, DataKinds #-}

data Shape = SNode Shape Shape | SLeaf

data Tree a sh where
    Leaf :: a -> Tree a SLeaf
    Node :: Tree a lsh -> a -> Tree a rsh -> Tree a (SNode lsh rsh)

With this you have a guarantee that an annotated tree has the same shape as the unannotated. But this doesn't work good without proper dependent types.

Also, have a look at the question Boilerplate-free annotation of ASTs in Haskell?

score 0 · Answer 3 · edited May 23 '17 at 10:27

The standard solution is what @DanielWagner suggested, just extend the data structure. This can be somewhat inconvenient, but can be solved: Smart constructors for creating instances and using records for pattern matching.

Perhaps Data types a la carte could help, although I haven't used this approach myself. There is a library compdata based on that.

A completely different approach would be to efficiently memoize the values you need. I was trying to solve a similar problem and one of the solutions is provided by the library stable-memo. Note that this isn't a purely functional approach, as the library is internally based on object identity, but the interface is pure and works perfectly for the purpose.

Gather data about existing tree-like data

3 Answers3