why Data.Set has no powerset function?

Question

I was looking at Data.Set, and I found out that it has no powerset function. Why?

I can implement it like this:

import Data.Set (Set, empty, fromList, toList, insert)

powerset :: (Ord a) => Set a -> Set (Set a)
powerset s = fromList $ map (fromList) (powerList $ toList s)

powerList :: [a] -> [[a]]
powerList [] = [[]]
powerList (x:xs) = powerList xs ++ map (x:) (powerList xs)

But this doesn't seem the most efficient way of doing it. OK, I can also write

powerList :: [a] -> [[a]]
powerList = filterM (const [True, False])

but still, I wonder why Data.Set has no powerset function.

Also, what is the best way to write powerset :: (Ord a) => Set a -> Set (Set a)?

I think the interesting thing here is if we can write something taking advantage of the special properties of `Set` to be more efficient than a naive solution. But I don't think such a thing is very easy. And since Set is spine-strict, and since powerSet generates 2^n elements regardless, even if you make things as efficient as possible, you're not getting a whole bunch of mileage out of those optimizations... — sclv, Jun 21 '11 at 16:19
@sclv: The `Ord` instance on `Set` is lexicographical order on the sorted list of elements. When constructing the power set, first remove the least element `x` and take the power set of the remainder, to get `y`. Every element of `y` will be greater than every element of `map (insert x) y`. — C. A. McCann, Jun 21 '11 at 16:34

score 12 · Accepted Answer · answered Jun 21 '11 at 17:16

12

Funny, I actually implemented powerset in Haskell the other day just for fun in a comment at /r/python.

import Data.Set
import Prelude hiding (map)

powerset s
    | s == empty = singleton empty
    | otherwise = map (insert x) pxs `union` pxs
        where (x, xs) = deleteFindMin s
              pxs = powerset xs

It is much as camccann described in his comment above. Set's fast union should give it a speed boost over the list version.

answered Jun 21 '11 at 17:16

Dan Burton

53,238
27
117
198

2

Note that this is probably still less efficient than an algorithm with access to the internals of `Set`. The representation is a balanced binary tree, I believe, and the union here will always be two sets of equal size with all elements larger/smaller than all elements of the other, so it should be sufficient to merely sum the sizes and make a new root with the two sets as branches. – C. A. McCann Jun 21 '11 at 17:31
@camccann this is true. I would be very interested to see such an implementation, and an analysis of its efficiency. – Dan Burton Jun 21 '11 at 17:35
Also, this algorithm may do unnecessary work deleting and inserting elements from the inner sets. Since only minimal elements are used, there are probably shortcuts available to avoid some rebalancing at intermediate steps. Anyway, I would be surprised if some analysis of this doesn't already exist, since powersets and balanced binary trees are not exactly obscure concepts. – C. A. McCann Jun 21 '11 at 17:42
Out of curiousity, is this any more efficient than `map Data.Set.fromAscList . filterM (const [False,True]) . Data.Set.toList`? If you're going to enumerate `2^n` subsets anyway, why not let the lists do the heavy work? You lose sharing (in a big way!) but that's the only negative I can think of. – yatima2975 Jun 21 '11 at 22:14
I bet you can't use `powerset` if your original set has more than, say, 20 elements. – augustss Jun 21 '11 at 23:28
I didn't mean your particular implementation, I meant in general. – augustss Jun 21 '11 at 23:50
@augustss: Actually, the cute `filterM` version seems to work fine for me in GHCi with 20 elements. It started going downhill around 18 though, and I doubt it'd make it to 30. – C. A. McCann Jun 22 '11 at 04:59
@camccann: I was in fact wondering why `Data.Set` does not provide an efficient implementation of powerset exploiting its internals. – MarcoS Jun 22 '11 at 10:59
1

@MarcoS: That sort of question is hard to answer without asking the people who implemented it, but probably because of the point that augustss is indirectly making: The size of the power set grows exponentially, which leaves a very narrow range of inputs expensive enough to care about efficiency, but cheap enough to compute at all. – C. A. McCann Jun 22 '11 at 13:43
The corollary to this is that, if someone *really* needs to work with the power set of a larger input, they'll probably want to roll their own algorithms anyway instead of using `Data.Set`, in order to exploit any additional regularity of the problem domain or such. – C. A. McCann Jun 22 '11 at 13:45

why Data.Set has no powerset function?

1 Answers1