How can this Haskell combination algorithm be improved?

Question

How can the solution, presented below, to the following problem be improved? Can it be more efficient in time and space? Is there a space leak?

Problem: Given an input list of Astronauts, produce a list of pairs of Astronauts such that the pairs do not have two Astronauts from the same country. Assume the input has a list of Astronauts with unique identifiers.

data Astronaut = Astronaut { identifier :: Int, country :: String } deriving (Eq)

astronautPairs :: [Astronaut] -> [(Astronaut, Astronaut)]
astronautPairs xs = foldl accumulatePairs [] [(a, b) | a <- xs, b <- xs, country a /= country b]
    where
        accumulatePairs pairs pair = if hasPair pair pairs then pairs else pair:pairs
        hasPair pair@(a,b) ((c,d):xs) = a == d && b == c || hasPair pair xs
        hasPair _ [] = False

How much space do you think it should take? Once you decide that, simply run your code on a large input and see if your memory usage matches (the `-stderr` runtime option could come in handy here). Then you tell *us* if it has a space leak. — crockeea, Oct 09 '15 at 22:08
You might like the `pairs` function from [one of my previous answers](http://stackoverflow.com/a/6473153/791604). — Daniel Wagner, Oct 09 '15 at 22:27
Do you want to produce both `(a,b)` and `(b,a)`, or do you intend to represent two-person teams? — dfeuer, Oct 10 '15 at 04:23

score 2 · Accepted Answer · answered Oct 10 '15 at 05:56

2

Instead of eliminating the flipped pairs why not avoid generating them in the first place. It is we that produce them, yes?

import Data.List (tails)

astronautPairs :: [Astronaut] -> [(Astronaut, Astronaut)]
astronautPairs xs = [ (y,z) | (y:ys) <- tails xs, z <- .... , country y /= country .... ]

This assumes that the input list of astronauts has no duplicates.

Thus, we avoid duplicates by a triangular generation.

(I leave a part of code out, for you to complete).

answered Oct 10 '15 at 05:56

Will Ness

70,110
9
98
181

Yep, that looks right to me. What is the space complexity for this? – hamsterdam Oct 10 '15 at 06:35
time is quadratic, of course. space will depend on the length of the final list. Nothing precludes a compiler from making a code which will work in O(1) space in addition to the O(n) size of the input list which will have to be fully present in the memory -- i.e. this is not an on-line algorithm, GC won't be able to reclaim the head element of the list even as the last element is accessed. – Will Ness Oct 10 '15 at 22:12

score 0 · Answer 2 · answered Oct 10 '15 at 04:39

Let's step back from implementation details and think about what you're trying to accomplish:

produce a list of pairs of Astronauts such that the pairs do not have two Astronauts from the same country

You seem to be allowed to assume that each astronaut appears only once in the list.

One efficient way to approach this problem is to start by partitioning your list by country. A natural way to do this is to build a HashMap String [Int] that holds a list of all the astronauts from each country.

import qualified Data.HashMap.Strict as HMS
import Data.HashMap.Strict (HashMap)

divideAstronauts :: [Astronaut] -> HashMap String [Int]
divideAstronauts = foldl' go mempty where
  go hm (Astronaut ident cntry) = HMS.insertWith (++) cntry [ident] hm

Now you can divide up the rest of the program into two steps:

Choose all pairs of countries.
For every pair of countries, choose all pairs of astronauts such that each comes from one of those countries.

How can this Haskell combination algorithm be improved?

2 Answers2