2

For some simple ADTs, we can obtain the ADT of sets of that ADT as data Set = Full Set (*repeated N times) | Empty Set(*repeated N times), where N is the number of non terminal constructors of that ADT. For a more solid example, take Nat:

data Nat = Succ Nat | Zero

We can obtain a type for sets of nats as follows:

data NatSet = Full NatSet | Empty NatSet

So, for example,

empty :: NatSet
empty = Empty empty

insert :: Nat -> NatSet -> NatSet
insert Zero (Empty natverse)       = Full natverse
insert Zero (Full natverse)        = Full natverse
insert (Succ nat) (Empty natverse) = Empty (insert nat natverse)
insert (Succ nat) (Full natverse)  = Full (insert nat natverse)

member :: Nat -> NatSet -> Bool 
member Zero (Full natverse)        = True
member Zero (Empty natverse)       = False
member (Succ nat) (Empty natverse) = member nat natverse
member (Succ nat) (Full natverse)  = member nat natverse

main = do
    let set = foldr insert empty [Zero, Succ (Succ Zero), Succ (Succ (Succ (Succ (Succ Zero))))]
    print $ member Zero set
    print $ member (Succ Zero) set
    print $ member (Succ (Succ Zero)) set
    print $ member (Succ (Succ (Succ Zero))) set
    print $ member (Succ (Succ (Succ (Succ Zero)))) set
    print $ member (Succ (Succ (Succ (Succ (Succ Zero))))) set

For other types, it is equally simple:

data Bin = A Bin | B Bin | C
data BinSet = Empty BinSet BinSet | Full BinSet BinSet

But what about types that branch? It isn't obvious to me:

data Tree = Branch Tree Tree | Tip
data TreeSet = ???

Is there any type-algebraic argument that maps ADTs to the ADTs of sets of that type?

MaiaVictor
  • 51,090
  • 44
  • 144
  • 286
  • (I just noticed this `BinSet` type is useless since for strict evaluators it won't halt, and for lazy evaluators it will build huge thunks and essentially be `O(N)`. This could be fixed adding a terminal constructor.) – MaiaVictor May 09 '16 at 13:41
  • 1
    Interesting! I don't have an answer for you right now, but I'll point out that `NatSet` is equivalent to `Stream Bool` and `Nat` is the type of positions in a stream. So I think your question really boils down to _"`Nat` is to `Stream` as `Tree` is to `?`"_ I've also been trying to understand this sort of thing recently: see [my question](http://stackoverflow.com/questions/36667541/indexing-into-containers-the-mathematical-underpinnings) which didn't really provoke any definitive answers. – Benjamin Hodgson May 09 '16 at 14:00
  • 1
    Thinking about it a bit more, I think your `Tree` type identifies positions in a kinda stream-like-thing where each cell could be labelled one of two ways. For want of a better name, `data TreeContainer a = L a (TreeContainer a) | R a (TreeContainer a)`, then `type TreeSet = TreeContainer Bool`. – Benjamin Hodgson May 09 '16 at 14:04
  • 1
    I sat down and bashed it out: http://pastebin.com/xcd7kbtr. It doesn't work. You were right to suspect that product types are the problem. You can't create an empty set of a product type - see line 56 of the paste. (That is, you can't give an empty `TreeSet`.) I tried adding a hacky `Empty` constructor to `SetF (f :* g)` but that broke `insert_`. – Benjamin Hodgson May 09 '16 at 16:20
  • @BenjaminHodgson that is all so interesting, thank you! So, in short, it is illogical to look for a `TreeContainer`? – MaiaVictor May 09 '16 at 19:29
  • I wrote it up as a full answer – Benjamin Hodgson May 10 '16 at 13:29

1 Answers1

2

Let's take another look at your set type.

data NatSet = Full NatSet | Empty NatSet

There's always another NatSet inside this one; the two branches of the sum type indicate whether the current Nat is present in the set. This is a structural representation of sets. As you observed, the structure of the set depends on the structure of the values.

It's equivalent to a stream of Booleans: we test for membership by indexing into the stream.

data NatSet = NatSet Bool NatSet

empty = NatSet False empty

insert Z (NatSet _ bs) = NatSet True bs
insert (S n) (NatSet b bs) = NatSet b (insert n bs)

(NatSet b _) `contains` Z = b
(NatSet _ bs) `contains` (S n) = bs `contains` n

Based on the insight that set membership is like indexing into a collection of Booleans, let's pursue a generic implementation of sets of values of types formed as the fixed point of a polynomial functor (what's the problem?).


First things first. As you observed, the set's structure depends on the type of the things inside it. Here's the class of things which can be elements of a set,

class El a where
    data Set a
    empty :: Set a
    full :: Set a

    insert :: a -> Set a -> Set a
    remove :: a -> Set a -> Set a

    contains :: Set a -> a -> Bool

For a first example, I'll modify NatSet from above to suit this format.

instance El Nat where
    data Set Nat = NatSet Bool (Set Nat)

    empty = NatSet False empty
    full = NatSet True empty

    insert Z (NatSet _ bs) = NatSet True bs
    insert (S n) (NatSet b bs) = NatSet b (insert n bs)

    remove Z (NatSet _ bs) = NatSet False bs
    remove (S n) (NatSet b bs) = NatSet b (remove n bs)

    (NatSet b _) `contains` Z = b
    (NatSet _ bs) `contains` (S n) = bs `contains` n

Another simple El instance which we'll need later is (). A set of ()s is either full or empty, with nothing in between.

instance El () where
    newtype Set () = UnitSet Bool
    empty = UnitSet False
    full = UnitSet True
    insert () _ = UnitSet True
    remove () _ = UnitSet False
    (UnitSet b) `contains` () = b

The fixed point of a functor f,

newtype Fix f = Fix (f (Fix f))

in an instance of El whenever f is an instance of the following El1 class.

class El1 f where
    data Set1 f a
    empty1 :: El a => Set1 f (Set a)
    full1 :: El a => Set1 f (Set a)
    insert1 :: El a => f a -> Set1 f (Set a) -> Set1 f (Set a)
    remove1 :: El a => f a -> Set1 f (Set a) -> Set1 f (Set a)
    contains1 :: El a => Set1 f (Set a) -> f a -> Bool

instance El1 f => El (Fix f) where
    newtype Set (Fix f) = FixSet (Set1 f (Set (Fix f)))
    empty = FixSet empty1
    full = FixSet full1
    insert (Fix x) (FixSet s) = FixSet (insert1 x s)
    remove (Fix x) (FixSet s) = FixSet (remove1 x s)
    (FixSet s) `contains` (Fix x) = s `contains1` x

As usual we'll be composing bits of functors into bigger functors, before taking the resulting functor's fixed point to produce a concrete type.

newtype I a = I a
newtype K c a = K c
data (f :* g) a = f a :* g a
data (f :+ g) a = L (f a) | R (g a)
type N = K ()

For example,

type Nat = Fix (N :+ N)
type Tree = Fix (N :+ (I :* I))

Let's make sets of these things.

I (for identity) is the location in the polynomial where Fix will plug in the recursive substructure. We can just delegate its implementation of El_ to Fix's.

instance El1 I where
    newtype Set1 I a = ISet1 a
    empty1 = ISet1 empty
    full1 = ISet1 full
    insert1 (I x) (ISet1 s) = ISet1 (insert x s)
    remove1 (I x) (ISet1 s) = ISet1 (remove x s)
    (ISet1 s) `contains1` (I x) = s `contains` x

The constant functor K c features no recursive substructure, but it does feature a value of type c. If c can be put in a set, K c can be too.

instance El c => El1 (K c) where
    newtype Set1 (K c) a = KSet1 (Set c)
    empty1 = KSet1 empty
    full1 = KSet_ full
    insert1 (K x) (KSet1 s) = KSet1 (insert x s)
    remove1 (K x) (KSet1 s) = KSet1 (remove x s)
    (KSet1 s) `contains1` (K x) = s `contains` x

Note that this definition makes Set1 N isomorphic to Bool.

For sums, let's use our intuition that testing for membership is like indexing. When you index into a tuple, you choose between the left-hand and right-hand members of the tuple.

instance (El1 f, El1 g) => El1 (f :+ g) where
    data Set1 (f :+ g) a = SumSet1 (Set1 f a) (Set1 g a)
    empty1 = SumSet1 empty1 empty1
    full1 = SumSet1 full1 full1
    insert1 (L x) (SumSet1 l r) = SumSet1 (insert1 x l) r
    insert1 (R y) (SumSet1 l r) = SumSet1 l (insert1 y r)
    remove1 (L x) (SumSet1 l r) = SumSet1 (remove1 x l) r
    remove1 (R y) (SumSet1 l r) = SumSet1 l (remove1 y r)
    (SumSet1 l r) `contains1` (L x) = l `contains1` x
    (SumSet1 l r) `contains1` (R y) = r `contains1` y

This is now enough to do sets of natural numbers. With appropriate instances of Show, you can do this:

ghci> let z = Fix (L (K ()))
ghci> let s n = Fix (R (I n))
ghci> insert (s z) empty `contains` z
False
ghci> insert (s z) empty `contains` (s z)
True
ghci> empty :: Set (Fix (N :+ I))
FixSet (SumSet1 (KSet1 (UnitSet False)) (ISet1 (FixSet (  -- to infinity and beyond

I still haven't answered your original question, which was how should this work for product types? I can think of a few strategies, but none of them actually work.

We could make Set1 (f :* g) a a sum type. This has a pleasing symmetry: sets of sums are products, and sets of products are sums. In the context of the indexing idea, this is like saying "in order to get a Bool out of the set, you have to give an index to handle each of the two possible cases", rather like Either a b's elimination rule (a -> c) -> (b -> c) -> Either a b -> c. However, you get stuck when you try to come up with meaningful values for empty1 and full1:

instance (El1 f, El1 g) => El1 (f :* g) where
    data Set1 (f :* g) a = LSet1 (Set1 f a) | RSet1 (Set1 g a)
    insert1 (l :* r) (LSet1 x) = LSet1 (insert1 l x)
    insert1 (l :* r) (RSet1 y) = RSet1 (insert1 r y)
    remove1 (l :* r) (LSet1 x) = LSet1 (remove1 l x)
    remove1 (l :* r) (RSet1 y) = RSet1 (remove1 r y)
    (LSet1 x) `contains1` (l :* r) = x `contains1` l
    (RSet1 y) `contains1` (l :* r) = y `contains1` r
    empty1 = _
    full1 = _

You could try adding hacky Empty and Full constructors to Set1 (f :* g) but then you'd struggle to implement insert1 and remove1.

You could interpret a set of a product type as a pair of the sets of the two halves of the product. An item is in the set if both of its halves are in their respective sets. Like a sort of generalised intersection.

instance (El1 f, El1 g) => El1 (f :* g) where
    data Set1 (f :* g) a = ProdSet1 (Set1 f a) (Set1 g a)
    insert1 (l :* r) (ProdSet1 s t) = ProdSet1 (insert1 l s) (insert1 r t)
    remove1 (l :* r) (ProdSet1 s t) = ProdSet1 (remove1 l s) (remove1 r t)
    (ProdSet1 s t) `contains1` (l :* r) = (s `contains1` l) && (t `contains1` r)
    empty1 = ProdSet1 empty1 empty1
    full1 = ProdSet1 full1 full1

But this implementation doesn't work correctly. Observe:

ghci> let tip = Fix (L (K ()))
ghci> let fork l r = Fix (R (I l :* I r))
ghci> let s1 = insert (fork tip tip) empty
ghci> let s2 = remove (fork tip (fork tip tip)) s1
ghci> s2 `contains` (fork tip tip)
False

Removing fork tip (fork tip tip) also removed fork tip tip. tip got removed from the left-hand half of the set, which meant any tree whose left-hand branch is tip got removed with it. We removed more items than we intended. (However, if you don't need a remove operation on your sets, this implementation works - though that's just another disappointing asymmetry.) You could implement contains with || instead of && but then you'll end up inserting more items than you intended.

Finally, I also thought about treating a set of products as a set of sets.

data Set1 (f :* g) a = ProdSet1 (Set1 f (Set1 g a))

This doesn't work - try implementing any of El1's methods and you'll get stuck right away.

So at the end of the day your intuition was right. Product types are the problem. There's no meaningful structural representation of sets for product types.

Of course, this is all academic really. Fixed points of polynomial functors have a well-defined notion of equality and ordering, so you can do what everyone else does and use Data.Set.Set to represent sets, even for product types. It won't have such good asymptotics, though: equality and ordering tests on such values are O(n) (where n is the size of the value), making membership operations on such sets O(n log m) (where m is the size of the set) because the set itself is represented as a balanced tree. Contrast with our generic structural sets, where membership operations are O(n).

Community
  • 1
  • 1
Benjamin Hodgson
  • 42,952
  • 15
  • 108
  • 157