How to enumerate a recursive datatype in Haskell?

Question

This blog post has an interesting explanation of how to use the Omega monad to enumerate an arbitrary grammar diagonally. He offers an example of how to do so, resulting in an infinite sequence of strings. I'd like to do the same, except that, instead of generating a list of strings, it generates a list of an actual datatype. For example,

 data T = A | B T | C T T

Would generate

A, B A, C A A, C (B A) A...

Or something similar. Unfortunately my Haskell skills are still maturing and after some hours playing it I couldn't manage to do what I want. How can that be done?

As requested, one of my attempts (I have tried too many things...):

import Control.Monad.Omega

data T = A | B T | C T T deriving (Show)

a = [A] 
        ++ (do { x <- each a; return (B x) })
        ++ (do { x <- each a; y <- each a; return (C x y) })

main = print $ take 10 $ a

Would generate that under what condition? WHat you're asking is unclear - please don't expect us to read the article you posted and try to articulate your problem within the question. — ScarletAmaranth, May 07 '14 at 10:26
@ScarletAmaranth the question is just how to enumerate the possible values of a recursive datatype. There is not much to add, I just want a list of possible values for a datatype... — MaiaVictor, May 07 '14 at 10:47
Perhaps writing a [Universe](http://hackage.haskell.org/package/universe-1.0/docs/Data-Universe.html) instance for [Generics](http://www.haskell.org/haskellwiki/GHC.Generics) would work. — phipsgabler, May 07 '14 at 11:12

score 8 · Accepted Answer · edited May 23 '17 at 10:29

8

My first ugly approach was:

allTerms :: Omega T
allTerms = do
  which <- each [ 1,2,3 ]
  if which == 1 then
    return A
  else if which == 2 then do
    x <- allTerms
    return $ B x
  else do
    x <- allTerms
    y <- allTerms
    return $ C x y

But then, after some cleaning up I reached this one liner

import Control.Applicative
import Control.Monad.Omega
import Control.Monad

allTerms :: Omega T
allTerms = join $ each [return A, B <$> allTerms, C <$> allTerms <*> allTerms]

Note that order matters: return A has to be the first choice in the list above, or allTerms will not terminate. Basically, the Omega monad ensures a "fair scheduling" among choices, saving you from e.g. infiniteList ++ something, but does not prevent infinite recursion.

An even more elegant solution was suggested by Crazy FIZRUK, exploiting the Alternative instance of Omega.

import Control.Applicative
import Data.Foldable (asum)
import Control.Monad.Omega

allTerms :: Omega T
allTerms = asum [ pure A
                , B <$> allTerms
                , C <$> allTerms <*> allTerms
                ]

edited May 23 '17 at 10:29

Community

1
1

answered May 07 '14 at 12:16

chi

111,837
3
133
218

Yay! That is awesome, man! I was almost there, didn't think alternating it like you did. Thanks for the ugly version, I would not be able to comprehend it otherwise. Thank you! – MaiaVictor May 07 '14 at 12:23
5

I think with `Alternative` this would look even better: `enum = pure A <|> B <$> enum <|> C <$> enum <*> enum` – fizruk May 07 '14 at 12:26
@CrazyFIZRUK Indeed! I was looking for `<|>` but `Omega` comes with no `Alternative` instance. I believe `x <|> y = join $ each [x,y]` should work, though (even if it is not associative AFAICS). – chi May 07 '14 at 12:50
@chi there is `Alternative` (as well as `MonadPlus`) instance for `Omega` at least in the latest version: https://hackage.haskell.org/package/control-monad-omega-0.3.1/docs/Control-Monad-Omega.html – fizruk May 07 '14 at 13:28
Oh that Applicative version is beautiful. The Alternative is even more beautiful. Please could you edit that in too. – AndrewC May 07 '14 at 15:23
1

@AndrewC I agree. I just included it. – chi May 07 '14 at 17:02

score 6 · Answer 2 · edited May 23 '17 at 12:26

I finally found the time to write a generic version. It uses the Universe typeclass, which represents recursively enumerabley types. Here it is:

{-# LANGUAGE DeriveGeneric, TypeOperators, ScopedTypeVariables #-}
{-# LANGUAGE FlexibleInstances, FlexibleContexts #-}
{-# LANGUAGE UndecidableInstances, OverlappingInstances #-}

import Data.Universe
import Control.Monad.Omega
import GHC.Generics
import Control.Monad (mplus, liftM2)

class GUniverse f where
    guniverse :: [f a]

instance GUniverse U1 where
    guniverse = [U1]

instance (Universe c) => GUniverse (K1 i c) where
    guniverse = fmap K1 (universe :: [c])

instance (GUniverse f) => GUniverse (M1 i c f) where
    guniverse = fmap M1 (guniverse :: [f p])

instance (GUniverse f, GUniverse g) => GUniverse (f :*: g) where
    guniverse = runOmega $ liftM2 (:*:) ls rs
        where ls = each (guniverse :: [f p])
              rs = each (guniverse :: [g p])

instance (GUniverse f, GUniverse g) => GUniverse (f :+: g) where
    guniverse = runOmega $ (fmap L1 $ ls) `mplus` (fmap R1 $ rs)
        where ls = each (guniverse :: [f p])
              rs = each (guniverse :: [g p])

instance (Generic a, GUniverse (Rep a)) => Universe a where
    universe = fmap to $ (guniverse :: [Rep a x])


data T = A | B T | C T T deriving (Show, Generic)
data Tree a = Leaf a | Branch (Tree a) (Tree a) deriving (Show, Generic)

I couldn't find a way to remove UndecidableInstances, but that should be of no greater concern. OverlappingInstances is only required to override predefined Universe instances, like Either's. Now some nice outputs:

*Main> take 10 $ (universe :: [T])
[A,B A,B (B A),C A A,B (B (B A)),C A (B A),B (C A A),C (B A) A,B (B (B (B A))),C A (B (B A))]
*Main> take 20 $ (universe :: [Either Int Char])
[Left (-9223372036854775808),Right '\NUL',Left (-9223372036854775807),Right '\SOH',Left (-9223372036854775806),Right '\STX',Left (-9223372036854775805),Right '\ETX',Left (-9223372036854775804),Right '\EOT',Left (-9223372036854775803),Right '\ENQ',Left (-9223372036854775802),Right '\ACK',Left (-9223372036854775801),Right '\a',Left (-9223372036854775800),Right '\b',Left (-9223372036854775799),Right '\t']
*Main> take 10 $ (universe :: [Tree Bool])
[Leaf False,Leaf True,Branch (Leaf False) (Leaf False),Branch (Leaf False) (Leaf True),Branch (Leaf True) (Leaf False),Branch (Leaf False) (Branch (Leaf False) (Leaf False)),Branch (Leaf True) (Leaf True),Branch (Branch (Leaf False) (Leaf False)) (Leaf False),Branch (Leaf False) (Branch (Leaf False) (Leaf True)),Branch (Leaf True) (Branch (Leaf False) (Leaf False))]

I'm not exactly sure what happens in the branching order of mplus, but I think it should all work out if Omega is correctly implemented, which I strongly believe.

But wait! The above implementation is not yet bug-free; it diverges on "left recursive" types, like this:

data T3 = T3 T3 | T3' deriving (Show, Generic)

while this works:

data T6 = T6' | T6 T6 deriving (Show, Generic)

I'll see if I can fix that. EDIT: At some time, the solution of this problem might be found in this question.

Wow. What!? That is awesome. Thank you. I didn't understand your concerns about mplus? — MaiaVictor, May 07 '14 at 18:59
This is glorious. GHC is just deriving all the boilerplate code for free! — Piotr Miś, May 07 '14 at 19:17
@Viclib By that I mean the fact that `Branch (Leaf False) (Branch (Leaf False) (Leaf False))` comes before `Branch (Leaf True) (Leaf True)`. Omega does not generate all trees in lexical order or something, but traverses their space in a square-like manner. — phipsgabler, May 07 '14 at 20:07
Nice job. I think `mplus` should work OK: it is defined as `mplus (Omega xs) (Omega ys) = Omega (diagonal [xs,ys])` so it will roughly interleave the two lists. — chi, May 08 '14 at 09:55
Yeah; it was that "roughly" that irritated me, I didn't doubt `Omega`'s correctness. — phipsgabler, May 08 '14 at 10:00

Ingo · Answer 3 · 2014-05-07T11:31:39.247

3

You really should show us what you have tried so far. But granted, this is not an easy problem for a bgeinner.

Let's try to write a naive version down:

enum = A : (map B enum ++ [ C x y | x <- enum, y <- enum ])

Ok, this actually gives us:

[A, B A, B (B A), B (B (B A)), .... ]

and never reaches the C values.

We obviously need to construct the list in steps. Say we already have a complete list of items up to a certain nesting level, we can compute the items with one nesting level more in one step:

step xs = map B xs ++ [ C x y | x <- xs, y <- xs ]

For example, we get:

> step [A]
[B A,C A A]
> step (step [A])
[B (B A),B (C A A),C (B A) (B A),C (B A) (C A A),C (C A A) (B A),C (C A A) (C A ...

What we want is thus:

[A] ++ step [A] ++ step (step [A]) ++ .....

which is the concatenation of the result of

iterate step [A]

which is, of course

someT = concat (iterate step [A])

Warning: You will notice that this still does not give all values. For example:

C A (B (B A))

will be missing.

Can you find out why? Can you improve it?

edited May 07 '14 at 11:31

answered May 07 '14 at 11:21

Ingo

36,037
5
53
100

I've been trying a lot of things, actually, but sadly most of the time was actually spent trying to understand monads / how the Omega monad works... here is a snapshot of my source http://lpaste.net/103738 . Thank you. – MaiaVictor May 07 '14 at 11:25
Oh and I could understand your solution at first sight, as it involves no monads at all, thanks! Which leaves just 2 questions: 1. is this the same kind of diagonal enumeration the blogger was talking about? And, 2. if so, why was the Omega Monad necessary on that blog post, at all? – MaiaVictor May 07 '14 at 11:30
@Viclib I have not read the post. Please also note the final warning I have appended to my post. – Ingo May 07 '14 at 11:32
Are you sure `C A (B (B A))` is missing? After the first step there will be a `(B A)`. After 2nd there will be a `(B (B A))`. So, after third, there will be a `C A (B (B A))`. No? – MaiaVictor May 07 '14 at 11:36
It is missing. Trying to find it in the list will cause `out of memory` error. :) – Piotr Miś May 07 '14 at 11:42
Huh I guess what we want is `step xs = xs ++ map B xs ++ [ C x y | x <- xs, y <- xs ]` instead... – MaiaVictor May 07 '14 at 11:54
1

[alternating append](http://stackoverflow.com/a/20516638/849891) should help, `enum = A : (map B enum ++/ map (uncurry C) (pairup enum enum)) ; (x:xs) ++/ ys = x:(ys ++/ xs) ; pairup (x:xs) ys = map (x,) ys ++/ pairup xs ys`. I expect `C A (B (B A))` to be within reach. (not tested) – Will Ness May 07 '14 at 12:54
1

it seems `enum = A, B A, C A A, B (B A), ...` so `C A (B (B A))` should be at `enum !! (2+4*3)`. – Will Ness May 07 '14 at 13:21

score 3 · Answer 4 · answered May 07 '14 at 15:19

Below is a terrible solution, but perhaps an interesting one.

We might consider the idea of adding "one more layer"

grow :: T -> Omega T
grow t = each [A, B t, C t t]

which is close to correct but has a flaw—in particular, in the C branch, we end up having both of the arguments take the exact same values instead of being able to vary independently. We can fix this by computing the "base functor" of T which looks like this

data T    = A  | B  T | C  T T
data Tf x = Af | Bf x | Cf x x deriving Functor

In particular, Tf is just a copy of T where the recursive calls are functor "holes" instead of direct recursive calls. Now we can write:

grow :: Omega T -> Omega (Tf (Omega T))
grow ot = each [ Af, Bf ot, Cf ot ot ]

which has a whole computation of a new set of T in each hole. If we could somehow "flatten" the Omega (Tf (Omega T)) into Omega T then we'd have a computation which adds "one new layer" to our Omega computation correctly.

flatten :: Omega (Tf (Omega T)) -> Omega T
flatten = ...

and we could take the fixed point of this layering with fix

fix :: (a -> a) -> a

every :: Omega T
every = fix (flatten . grow)

So the only trick is to figure out flatten. To do this we need to notice two features of Tf. First, it's Traversable so we can use sequenceA to "flip" the order of Tf and Omega

flatten = ?f . fmap (?g . sequenceA)

where ?f :: Omega (Omega T) -> Omega T is just join. The final tricky bit is figuring out ?g :: Omega (Tf T) -> Omega T. Obviously, we don't care about the Omega layer so we should just fmap a function of type Tf T -> T.

And this function is very close to the defining notion for the relationship between Tf and T: we can always compress a layer of Tf on the top of T.

compress :: Tf T -> T
compress Af         = A
compress (Bf t)     = B t
compress (Cf t1 t2) = C t1 t2

All together we have

flatten :: Omega (Tf (Omega T)) -> Omega T
flatten = join . fmap (fmap compress . sequenceA)

Ugly, but all together functional.

How to enumerate a recursive datatype in Haskell?

4 Answers4

Linked