28

tl;dr: how do you write instances of Arbitrary that don't explode if your data type allows for way too much nesting? And how would you guarantee these instances produce truly random specimens of your data structure?

I want to generate random tree structures, then test certain properties of these structures after I've mangled them with my library code. (NB: I'm writing an implementation of a subtyping algorithm, i.e. given a hierarchy of types, is type A a subtype of type B. This can be made arbitrarily complex, by including multiple-inheritance and post-initialization updates to the hierarchy. The classical method that supports neither of these is Schubert Numbering, and the latest result known to me is Alavi et al. 2008.)

Let's take the example of rose-trees, following Data.Tree:

data Tree a = Node a (Forest a)
type Forest a = [Tree a]

A very simple (and don't-try-this-at-home) instance of Arbitray would be:

instance (Arbitrary a) => Arbitrary (Tree a) where
    arbitrary = Node <$> arbitrary <$> arbitrary

Since a already has an Arbitrary instance as per the type constraint, and the Forest will have one, because [] is an instance, too, this seems straight-forward. It won't (typically) terminate for very obvious reasons: since the lists it generates are arbitrarily long, the structures become too large, and there's a good chance they won't fit into memory. Even a more conservative approach:

arbitrary = Node <$> arbitrary <*> oneof [arbitrary,return []]

won't work, again, for the same reason. One could tweak the size parameter, to keep the length of the lists down, but even that won't guarantee termination, since it's still multiple consecutive dice-rolls, and it can turn out quite badly (and I want the odd node with 100 children.)

Which means I need to limit the size of the entire tree. That is not so straight-forward. unordered-containers has it easy: just use fromList. This is not so easy here: How do you turn a list into a tree, randomly, and without incurring bias one way or the other (i.e. not favoring left-branches, or trees that are very left-leaning.)

Some sort of breadth-first construction (the functions provided by Data.Tree are all pre-order) from lists would be awesome, and I think I could write one, but it would turn out to be non-trivial. Since I'm using trees now, but will use even more complex stuff later on, I thought I might try to find a more general and less complex solution. Is there one, or will I have to resort to writing my own non-trivial Arbitrary generator? In the latter case, I might actually just resort to unit-tests, since this seems too much work.

Mikhail Glushenkov
  • 14,928
  • 3
  • 52
  • 65
Aleksandar Dimitrov
  • 9,275
  • 3
  • 41
  • 48
  • 5
    The [Quickcheck manual](http://www.cse.chalmers.se/~rjmh/QuickCheck/manual.html) explains this and has helpful examples (though they may be for QuickCheck v1); see the sections titled ["The size of test data"](http://www.cse.chalmers.se/~rjmh/QuickCheck/manual_body.html#15) and ["Generating recursive data types"](http://www.cse.chalmers.se/~rjmh/QuickCheck/manual_body.html#16). – Luis Casillas Apr 11 '13 at 22:10
  • This is great, thanks. I didn't read through the entire manual, but of course, I should have. I guess it was really hard (for me) to google for this. I think this should solve my problem. – Aleksandar Dimitrov Apr 11 '13 at 22:24

3 Answers3

22

Use sized:

instance Arbitrary a => Arbitrary (Tree a) where
arbitrary = sized arbTree

arbTree :: Arbitrary a => Int -> Gen (Tree a)
arbTree 0 = do
    a <- arbitrary
    return $ Node a []
arbTree n = do
    (Positive m) <- arbitrary
    let n' = n `div` (m + 1)
    f <- replicateM m (arbTree n')
    a <- arbitrary
    return $ Node a f

(Adapted from the QuickCheck presentation).

P.S. Perhaps this will generate overly balanced trees...

Joachim Breitner
  • 25,395
  • 6
  • 78
  • 139
Koterpillar
  • 7,883
  • 2
  • 25
  • 41
  • Great answer, thanks. That's basically in line with what @sacundim linked to in the manual. I hope this could be useful for others then, since the answer seems to be relatively straightforward. – Aleksandar Dimitrov Apr 11 '13 at 22:25
  • Yes, I tried to adapt this from binary trees though (in part because I realized this can help me on a project of mine too ;) ) – Koterpillar Apr 11 '13 at 22:26
  • 1
    A comment on the "too balanced" problem: instead of using `mapM` in this way, one could generate a random partition of an arbitrary length list, and take that as the basis for generating children. Oleg's random list shuffles might provide help. – Aleksandar Dimitrov Apr 11 '13 at 22:30
7

You might want to use the library presented in the paper "Feat: Functional Enumeration of Algebraic Types" at the Haskell Symposium 2012. It is on Hackage as testing-feat, and a video of the talk introducing it is available here: http://www.youtube.com/watch?v=HbX7pxYXsHg

2

As Janis mentioned, you can use the package testing-feat, which creates enumerations of arbitrary algebraic data types. This is the easiest way to create unbiased uniformly distributed generators for all trees of up to a given size.

Here is how you would use it for rose trees:

import Test.Feat (Enumerable(..), uniform, consts, funcurry)
import Test.Feat.Class (Constructor)
import Data.Tree (Tree(..))
import qualified Test.QuickCheck as QC

-- We make an enumerable instance by listing all constructors
-- for the type. In this case, we have one binary constructor:
-- Node :: a -> [Tree a] -> Tree a
instance Enumerable a => Enumerable (Tree a) where
    enumerate = consts [binary Node]
      where
        binary :: (a -> b -> c) -> Constructor c
        binary = unary . funcurry

-- Now we use the Enumerable instance to create an Arbitrary
-- instance with the help of the function:
-- uniform :: Enumerable a => Int -> QC.Gen a
instance Enumerable a => QC.Arbitrary (Tree a) where
    QC.arbitrary = QC.sized uniform
    -- QC.shrink = <some implementation>

The Enumerable instance can also be generated automatically with TemplateHaskell:

deriveEnumerable ''Tree
Hjulle
  • 2,471
  • 1
  • 22
  • 34
  • This example snippet does not appear to correspond to [testing-feat 1.1.0.0](https://hackage.haskell.org/package/testing-feat) of May 26, 2018. It appears that your example came only ~1½ month before breaking API changes. Perhaps this is only a matter of fixing imports, as e.g. `consts` is now only in Test.Feat.Class, and so on. But the deprecation for Test.Feat.Class asks one to use Control.Enumerable instead. – sshine Jan 08 '19 at 10:29
  • The import `Constructor` does not appear anywhere in 1.1.0.0, but was a type alias, [`type Constructor = Enumerate`](https://hackage.haskell.org/package/testing-feat-0.4.0.3/docs/src/Test-Feat-Class.html#Constructor) in 0.4.0.3. There's an [examples](https://github.com/JonasDuregard/testing-feat/tree/master/examples) directory, but it lacks an example as simple as this one. – sshine Jan 08 '19 at 10:36