Is there a built-in for this?

Question

I'm looking for a function of type

[[(a, b)]] -> [(a, [b])]

and Hoogle tells me there isn't one, so I've written this:

transpose :: (Eq a) => [[(a, b)]] -> [(a, [b])]
transpose alists = uncurry zip $ foldl combine ([], []) alists
    where combine memo new = foldl tally memo new
          tally ([], []) (k, v) = ([k], [[v]])
          tally ((ksHead:ksRest), (vsHead:vsRest)) (k, v) = 
              if k == ksHead
              then (ksHead:ksRest, (v:vsHead):vsRest)
              else (ksHead:ks, vsHead:vs)
                  where (ks, vs) = tally (ksRest, vsRest) (k, v)

In order of importance:

Is there actually a built-in for this that Hoogle doesn't know about?
Is transpose the right name for this thing?
Is there a better (more readable and/or more performant) way of writing this?

EDIT

Since someone's interested in the flavor:

I'm writing a stack ranking app, and ballots from the user come in in the form of [(Candidate, Rank)]. In order to compute the winner by, for example, Borda count, I need to tally the ranking of each Candidate by combining those ballots. Comments on the more general problem are also welcome.

I don't recognize this function, but `transpose` is already something else. — Ørjan Johansen, Jun 13 '14 at 14:56
What exactly do you want this function to do? Your implementation does `transpose [[(1, 2), (2, 3)], [(4, 5), (6, 7)], [(4, 10)]] == [(1,[2]),(2,[3]),(4,[10,5]),(6,[7])]`, which looks like it just does a `concat` then does a `groupBy` and then aggregates the second elements. You could definitely write this a lot simpler than you have, but I would doubt it already exists. — bheklilr, Jun 13 '14 at 14:58

score 6 · Answer 1 · answered Jun 13 '14 at 23:53

6

You should consider using a map data structure for this, which makes it much easier:

import Data.Map (Map)
import qualified Data.Map as Map

transpose: Ord a => [(k, v)] -> Map k [v]
transpose = Map.fromListWith (++) [] . map (\(k, v) -> (k, [v])

In fact, this is basically the example the documentation gives for fromListWith.

answered Jun 13 '14 at 23:53

Luis Casillas

29,802
7
49
102

Using `Map` for problems like this makes so much sense to me -- it's exactly the right concept. I don't understand why `map.group.sort` gets so much love. +1 – luqui Jun 14 '14 at 00:49
3

There is even package [multimap](http://hackage.haskell.org/package/multimap) for working with such maps, where `MultiMap k v` is isomorphic to `Map k [v]`. There you have directly `fromList :: Ord k => [(k, a)] -> MultiMap k a`. – Petr Jun 14 '14 at 08:44

score 5 · Accepted Answer · answered Jun 13 '14 at 15:02

5

transpose is usually used to mean a list operation along the lines of matrix transposition. In fact, Data.List.transpose does exactly that.

I'm not fully sure what your code is doing. It's honestly really convoluted. Do you mean something like this?

import Data.List
import Data.Function
import Control.Arrow

groupByKey :: Ord a => [[(a, b)]] -> [(a, [b])]
groupByKey = map (fst . head &&& map snd) . groupBy kEq . sortBy kCmp . concat
  where
    kEq = (==) `on` fst
    kCmp = compare `on` fst

If this is what you're doing, upgrading the constraint to Ord a improves the algorithm to O(n log n) instead of O(n ^ 2).

answered Jun 13 '14 at 15:02

Carl

26,500
4
65
86

+1 This is the exact solution I was just about to post to the letter (except I kept `kEq` and `kCmp` inline). – bheklilr Jun 13 '14 at 15:04
@bheklilr I had them inline, but the line got really long. So I split them out at the last moment. – Carl Jun 13 '14 at 15:04
I figured, considering this implementation still almost runs off the edge of the page. It isn't precisely the same implementation that OP used, since that one seems to reverse the order of the values, but in the majority of cases I doubt this wouldn't matter. – bheklilr Jun 13 '14 at 15:07
If this is what the OP intends, why would they want to take a list of lists? – dfeuer Jun 13 '14 at 15:07
2

@Inaimathi expanding what dfeuer has said, just keep your data as `:: [Map Candidate Rank]`, and then [`foldl (unionWith (++)) empty`](http://www.haskell.org/ghc/docs/latest/html/libraries/containers/Data-Map-Lazy.html) them. – Will Ness Jun 13 '14 at 16:06
1

@WillNess - For the record, your commend is what I ended up actually using, but I've had quite a bit of fun playing with `on` and `&&&`. – Inaimathi Jun 13 '14 at 21:18
you could stick with lists too, switching to `Ord a` constraint as Carl says, and then, assuming each list is already sorted by key, [tree-fold](https://en.wikipedia.org/wiki/Fold_%28higher-order_function%29#Tree-like_folds) them together with a specialized merge. – Will Ness Jun 14 '14 at 08:03

score 1 · Answer 3 · edited May 23 '17 at 10:25

You would help yourself to uncover the hidden structure in your code by using one-letter variables, not to get distracted by the meaning implied by your name choices:

g :: (Eq a) => [[(a, b)]] -> [(a, [b])]
g alists = uncurry zip $ foldl (foldl f) ([], []) alists 
    where 
          f ([], []) (k, v) = ([k], [[v]])
          f ((h:t), (u:s)) (k, v)  
            | k == h    = (h:t, (v:u):s)
            | otherwise = (h:q,    u :r) where (q, r) = f (t, s) (k, v)
                        -- ((h:) *** (u:)) $ f (t,s) (k,v)

this is a bit unnatural, working on unzipped interim data and zipping them back for output. No need for that, we can work with the same type of interim data as output:

g2 :: (Eq a) => [[(a, b)]] -> [(a, [b])]
g2 alists = foldl (foldl f) [] alists 
    where 
          f [] (k, v) = [(k,[v])]
          f ((a,b):t) (k, v)  
            | k == a    = (a,v:b):t
            | otherwise = (a,b):f t (k,v)

Now it is clear that f is a kind of "insert", a paramorphism,

          f xs (k, v) = para (\(a,b) t r -> if a==k then (a,v:b):t 
                                                    else (a,  b):r) 
                             [(k,[v])] xs
para c z []    = z
para c z (x:t) = c x t $ para c z t

foldl (foldl f) [] alists === foldl f [] $ concat alists, and if it is possible to switch to the (Ord a) constraint, then the efficiency can be improved with reimplemented f,

g3 :: (Ord a) => [[(a, b)]] -> [(a, [b])]
g3 = foldl f [] . concat
  where
    f xs (k, v) = para (\(a,b) t r -> if k < a then (k,[v]):(a,b):t 
                                  else if a==k then (a,v:b):t
                                  else              (a,  b):r) 
                       [(k,[v])] xs

to improve the complexity of this code further, we can go the other route (than the concat) and join the input lists through a tree of merges,

g4  :: (Ord a) => [[(a, b)]] -> [(a, [b])]
g4 alists = foldt u [] 
              . map (map (\(a,b) -> (a,[b])) . sortBy (comparing fst))
              $ alists
  where
    u xs [] = xs
    u [] ys = ys
    u xs@((a,b):t) ys@((c,d):r) = case compare a c of
        LT -> (a,b)    : u t ys
        EQ -> (a,b++d) : u t  r
        GT -> (c,d)    : u xs r

foldt f z []  = z
foldt f z [x] = x
foldt f z xs  = foldt f z $ pairwise f xs   -- tree-like folding
pairwise f (a:b:t) = f a b : pairwise f t
pairwise f xs      = xs

comparing is from Data.Ord. If your data pieces come in already sorted (which is likely in your scenario), you can omit the sortBy part for an additional algorithmic gain. Thus, this version is a kind of mergesort (possibly doing only merging, without the sorting).

Is there a built-in for this?

3 Answers3