3

I'd like to generate all possible pairs of pairs from a list, with the restriction that (a,b) == (b,a) and that ((a,b),(c,d)) == ((c,d),(a,b)) for all a, b, c, d. Additionally, I can assume that all elements in the list I supply as an argument are distinct.

The first thing I did was to write down this list comprehension:

pairsOfPairs :: [a] -> [((a,a),(a,a))]
pairsOfPairs xs = [((w,x),(y,z)) | w <- xs, x <- xs, y <- xs, z <- xs,
                      w < x, y < z, w < y, w /= z, x /= y, x /= z]

This has the virtue of being idiomatic, but is very slow (profiling reveals that close to 90% of the running time was spent in this function and another, similar, function).

The reason for the slowness is that for a list of n elements it generates n^4 candidate pairs of pairs, but the restrictions eventually cut it down to n!/(8 * (n-4)!), meaning that we are doing at least 8 times too much work.

Is there a way to rewrite the function pairsOfPairs that will give it a speed boost? Obviously it will still be O(n^4), but I'm hoping to bring down the constant.


Edit: In fact, I almost always call this function with a list of length 5, meaning that there are 5!/8 = 15 elements in the result, but the function generates a list of 5^4 = 625 elements as an intermediate step. If I could eliminate all of these intermediate elements I'd therefore get a speedup of around 40x!

Chris Taylor
  • 46,912
  • 15
  • 110
  • 154
  • Your first paragraph seems to imply that a = b = c = d. I don't think that is what you mean. – dave4420 Jan 04 '12 at 18:03
  • @dave4420 I'm using '==' to mean equivalence, i.e. that the pair (a,b) should be counted as equivalent to the pair (b,a), but not that a and b must be equal. – Chris Taylor Jan 04 '12 at 18:08

1 Answers1

8

A simple way to cut down the amount of work done is to filter as early as possible.

pairsOfPairs :: Ord a => [a] -> [((a,a),(a,a))]
pairsOfPairs xs = [((w,x),(y,z)) | w <- xs, x <- xs, w < x, y <- xs, w < y, x /= y, 
                                   z <- xs, y < z, w /= z, x /= z]

By checking the conditions as soon as they are available, we avoid O(n^2) work for each pair (w,x) with x <= w etc. That's not too bad.

But more can be gained by preprocessing the list, if it is sorted, we can avoid almost all unnecessary work by choosing pairs like

ordPairs :: [a] -> [(a,a)]
ordPairs (x:xs) = [(x,y) | y <- xs] ++ ordPairs xs
ordPairs [] = []
Daniel Fischer
  • 181,706
  • 17
  • 308
  • 431
  • A simple re-ordering of the bindings and filters in the list comprehension has made a big speed difference - thanks a lot! Out of curiosity, is there a reason why GHC can't optimize this itself? Is it because list comprehensions are sugar for do notation, and do blocks are imperative? – Chris Taylor Jan 04 '12 at 17:21
  • 4
    You might also like the implementation from [this answer](http://stackoverflow.com/questions/6472883/using-list-elements-and-indices-together/6473153#6473153): `pairs xs = [(x, y) | x:ys <- tails xs, y <- ys]`. – Daniel Wagner Jan 04 '12 at 17:45
  • Very nice, @Daniel, I like that. – Daniel Fischer Jan 04 '12 at 18:16
  • 4
    @Chris do blocks are syntactic sugar for `(>>=)` and `(>>)` chains, they're not imperative as such. They tend to look imperative, sometimes that makes the code clearer, other times not. The only reason GHC can't make such a transformation itself is because it's not implemented. I tend to believe it's not implemented because it's hard to verify that reordering doesn't change semantics and improves performance, and there simply aren't enough GHC hackers, so nobody had the time to do it so far. – Daniel Fischer Jan 04 '12 at 18:25