Recursively walking an AST using foldl and the Reader monad without boilerplate

Question

I'm traversing an AST using simple pattern matching and the Reader monad.

Elsewhere in my project I've defined a walk function for traversing the AST, which at its heart uses foldl to reduce the results of visiting each node in the tree into a single monoidal result (for example, to produce a "symbol table" from the special nodes in the tree).

My question is: is it possible to combine these two approaches and use a function like my walk function:

walk :: Monoid a => (Node -> a) -> a -> Node -> a
walk f acc n = foldl (walk f) (acc <> f n) children
  where
    children = case n of
      Blockquote b           -> b
      DocBlock d             -> d
      FunctionDeclaration {} -> functionBody n
      List l                 -> l
      ListItem i             -> i
      Paragraph p            -> p
      Unit u                 -> u
      _                      -> [] -- no Node children

and the Reader — like the traversal in the code below (some bits omitted for brevity) — at the same time?

markdown :: Node -> String
markdown n = runReader (node n) state
  where state = State (getSymbols n) (getPluginName n)

node :: Node -> Env
node n = case n of
  Blockquote b            -> blockquote b >>= appendNewline >>= appendNewline
  DocBlock d              -> nodes d
  FunctionDeclaration {}  -> nodes $ functionBody n
  Paragraph p             -> nodes p >>= appendNewline >>= appendNewline
  Link l                  -> link l
  List ls                 -> nodes ls >>= appendNewline
  ListItem l              -> fmap ("- " ++) (nodes l) >>= appendNewline
  Unit u                  -> nodes u

My use motivation here is that my walk function already encodes the knowledge of how to do get the children for each pattern and how to perform an in-order traversal of the AST. I don't really want to reimplement that for every traversal, so it would be nice to use walk in more places, including ones where I need to use Reader (and likely later on, State, probably in a stack).

Can these things be fruitfully combined?

I don't know and I don't have time to look in depth right now, but I'm guessing you should look less at `foldl` and more at `traverse`. — dfeuer, Apr 05 '16 at 05:35

hao · Accepted Answer · 2016-04-05T07:34:04.177

A lensy approach

A moment for generic programming to shine! This very problem, the problem of folding over a recursive datatype without boilerplate, is the motivation for the uniplate/biplate library. That design now lives on in its most modern form in Control.Lens.Plated c/o the lens package. To take advantage of it:

Turn on DeriveDataTypeable and add deriving (Data) to your Node, ArgumentList, Argument.
Right away you're able to take advantage of uniplate from Data.Data.Lens. This is a traversal, an object that – when passed to the right lens helpers – will yield you all the values of type Node inside a given Node. Basically it runs one step of your recursive walk function.

An example:

λ> Blockquote [BreakTag, Blockquote [BreakTag]] ^.. uniplate
[BreakTag,Blockquote [BreakTag]]
λ> Blockquote [BreakTag, Blockquote [BreakTag]] ^.. uniplate . uniplate
[BreakTag]
λ> Blockquote [BreakTag, Blockquote [BreakTag]] ^.. uniplate . uniplate . uniplate
[]

But wait, there's more. If uniplate is one small step for generics, then cosmosOf uniplate is one large step for programmerkind. cosmosOf repeatedly uses uniplate to retrieve the children, grandchildren, greatgrandchildren, and so on from a given Node.
```
λ> Blockquote [BreakTag, Blockquote [BreakTag]] ^..  cosmosOf uniplate
[ Blockquote [BreakTag,Blockquote [BreakTag]]
, BreakTag
, Blockquote [BreakTag]
, BreakTag]
```
In both examples, we're taking advantage of how compositional lens traversals (and folds) are. The hierarchy of lens and why they compose so well is beyond the confines of this little textbox, but suffice it to say that they're super mega useful.
Use the foldlOf helper in Control.Lens.Fold to implement your walk function:
```
walk' :: Monoid a => (Node -> a) -> a -> Node -> a
walk' f acc n =
  foldlOf (cosmosOf uniplate . to f) (<>) acc n
```
Not so bad. to f creates a getter from your f and it's composed with the cosmic fold to reach all the descendants; each value from this getter is folded and accumulated into our monoid.

As for node, you'll have to build a custom fold. cosmosOf uniplate doesn't work so well here because sometimes you short-circuit the recursion (e.g. in the Blockquote case). You'll have to write cosmosOf foo and build foo piecemeal out of lens helpers; note you can still use uniplate to discharge most of the cases in your custom fold. This is quite a large bit of code and it's getting awful late, so I'll leave it as an exercise to the reader. I'm 90% certain it's possible.

As for the reader monad, you can either use foldlMOf or note that type Env = Reader State String is isomorphic to State -> String and note that State -> String has a Monoid instance because Monoid String exists. This all means that you should be able to implement node with the non-monadic foldlOf, like we did above – all we really want to do is concatenate a bunch of monoidal values, in the end.

This solution isn't perfect: it requires future code readers to know a good bit of knowledge about lens and how traversals/folds/getters gel with Data.Data and why these functions all have funny little Of suffixes on them. But you must admit that there's a beautiful sort of conciseness and power to Plated that abstracts away the boring recursion part of folding over custom datatypes so you only have to pattern match on the leaves of the data structure (like BreakTag in node) and edge cases (like Blockquote in node).

score 3 · Answer 2 · edited May 23 '17 at 11:52

Your walk function looks very much like the Foldable class, which is the superclass of Traversable (which dfeuer also points out in a comment). In your position I'd have a look at the mono-traversable package.

But I have a hunch that your Node type won't actually turn out to be a good MonoTraversable instance as currently formulated. The reason, informally, is that there isn't a good separation between these concepts:

The structure of the AST: which nodes are the children of which.
The content of the AST: the "attributes" at each node.

One very informal way to think of the Functor and Traversable classes is that they operate on the "content" of a value without altering is "structure" (but beware of taking that analogy too literally). I considered writing a MonoTraversable instance for your Node type but backed off quickly after some thought:

{-# LANGUAGE TypeFamilies #-}

import Data.MonoTraversable

-- Here is the root of the problem.  The type function 
-- `Element` is supposed to pick out which pieces of a 
-- `Node` are its "content" as opposed to its "structure,"
-- but with `Node` as formulated there isn't a distinction!
type instance Element Node = Node

Given that, the otraverse method, which has this principal type:

otraverse :: Applicative f => (Element mono -> f (Element mono)) -> mono -> f mono

...would end up with this specialized type when mono := Node:

otraverse :: Applicative f => (Node -> f Node) -> Node -> f Node

...and I'm worried that somebody could try to call otraverse with a function that discards the children of the root node, e.g., how evil is it when somebody attempts otraverse (return Separator)?

I haven't checked whether these instances are unlawful for sure, so I might be worried over nothing here, it's worth checking if I'm wrong. But it certainly "smells bad" according to my design sense.

So what to do? I'd look into reformulating the Node type a little bit, splitting it into two types:

A Content type that expresses the information that is visible at each step of the left-to-right walk of the AST, but not how the Contents nest;
A Structure type that defines the possible shapes of the tree—how the Contents nest.

Once that is done, you'd be able to reformulate your walk function like this:

walk :: Monoid a => (Content -> a) -> a -> Structure -> a

Note that the second argument of this function (the initial acc :: Monoid a => a value) is superfluous, these two functions look interdefinable:

walk' :: Monoid a => (Content -> a) -> Structure -> a
walk' f = walk f mempty

walk :: Monoid a => (Content -> a) -> a -> Structure -> a
walk f acc tree = acc <> walk' f tree

And my walk' signature then looks like it would be the ofoldMap method from mono-traversable:

ofoldMap :: Monoid m => (Element mono -> m) -> mono -> m

Which then would make it natural to ask whether you can implement MonoTraversable for your reformulated type and get the otraverse operation whose signature I mentioned above, which specialized with f := Reader r, and mono := Structure and Element mono := Content would be:

otraverse :: (Content -> Reader r Content) -> Structure -> Reader r Structure

Recursively walking an AST using foldl and the Reader monad without boilerplate

2 Answers2

A lensy approach