How to generate the cartesian join of a list of tuples based on their first value?

Question

I have a list of tuple of type (Int,String) and i want to generate a list of type [String] in which each string is a cartesian join permutation of second elements of the tuples that have the same Int value. For example:

input: [(1,"abc"),(1,"def"),(2,"ghi"),(2,"kl")]

output: ["ad", "ae", "af", "bd", "be", "bf", "cd", "ce", "cf", "gk", "gl", "hk", "hl", "ik", "il"]

I tried this but i can't find a way of just permuting the tuples with the same Int value:

possible_keys :: [(Int,String)] -> [String]
possible_keys subkeys = [ key | keysize <- keysizes, key<-keys]
  where keysizes = map (fst) subkeys
        keys = sequence (map (snd) subkeys)

Any hint?

@duplode if there is just one tuple , that tuple is ignored . if there's more than 2 tuples with the same int value like in your exampe it would give ["adx", "ady","adz","aex","aey","ayz","afx","afy","afz","bex","bey"...] — Guilherme Lima, Apr 22 '19 at 01:10
That's fine -- I had looked only at your example, and didn't notice you were already fine with using `sequence` for the permutations. — duplode, Apr 22 '19 at 01:11
yeah it is cycling ok but i still can't find a way of filtering the permutations with other tuples rather than those with the same int value — Guilherme Lima, Apr 22 '19 at 01:13

duplode · Answer 1 · 2019-04-22T02:32:52.097

1

You can use groupBy to keep the groups of tuples apart:

import Data.Function (on)
import Data.Ord (comparing)
import Data.List

possible_keys :: Ord k => [(k, [c])] -> [[c]]
possible_keys = concat . fmap sequenceA . fmap (fmap snd)
    . groupBy ((==) `on` fst) . sortBy (comparing fst)

(I'm assuming the input list isn't necessarily sorted. I'm sorting by the keys alone to make it minimally invasive. comparing fst is the same as compare `on` fst; unfortunately there isn't an analogous equating in the standard library yet. sequenceA is the same as sequence, only more general.)

This can be slimmed down a bit. The second functor law allows us to combine consecutive uses of fmap:

possible_keys :: Ord k => [(k, [c])] -> [[c]]
possible_keys = concat . fmap (sequenceA . fmap snd)
    . groupBy ((==) `on` fst) . sortBy (comparing fst)

fmap followed by sequenceA is traverse:

possible_keys :: Ord k => [(k, [c])] -> [[c]]
possible_keys = concat . fmap (traverse snd)
    . groupBy ((==) `on` fst) . sortBy (comparing fst)

Finally, fmap followed by concat on a list is concatMap (or (=<<), but that would be a little too cryptic here):

possible_keys :: Ord k => [(k, [c])] -> [[c]]
possible_keys = concatMap (traverse snd)
    . groupBy ((==) `on` fst) . sortBy (comparing fst)

Note that this will generate strings of length one for keys with just one tuple -- sequenceA ["abc"] is ["a","b","c"]. If you don't want that, you can filter the groups of tuples immediately after the groupBy to get rid of those with just one tuple.

edited Apr 22 '19 at 02:32

answered Apr 22 '19 at 01:27

duplode

33,731
7
79
150

there's also [`sortOn`](https://hackage.haskell.org/package/base-4.12.0.0/docs/Data-List.html#v:sortOn) now. I wonder if it's smart enough to not do the whole decorate-undecorate thing for such simple accessors as `fst`, though. Probably not. – Will Ness Apr 22 '19 at 02:00
@WillNess Last time I tried to use `sortOn` here, [dfeuer told us](https://stackoverflow.com/questions/53491403/is-there-a-short-lambda-expression-for-a-a-ordering/53572182#comment94656383_53491426) that performance-wise it is only worth using if the projection function you give to it is expensive. (As you note in your edited comment.) – duplode Apr 22 '19 at 02:04
1

yeah, thought of it in the mean time. :) that's unfortunate; the more detail a programmer needs to concern themself with, the worse. I'm a radical here; I'd like to see a language with e.g. sequences, where the compiler itself would choose lists or trees or arrays or whatever, as needed. (I'm not building it though...:) ). – Will Ness Apr 22 '19 at 02:07
1

@WillNess Unfortunate indeed. I kind of feel the "sortOn" name is too inviting for such a special-purpose function... – duplode Apr 22 '19 at 02:23
2

@WillNess, duplode, there is a function that does the job, but it's buried in `GHC.Exts`: `sortWith`. – dfeuer Apr 22 '19 at 02:43
2

@dfeuer Thanks for the pointer. And there is a sorting `groupWith` there as well! – duplode Apr 22 '19 at 02:50
@dfeuer, duplode thanks. but, when an accessor is heavy, the decorate-sort-undecorate transform is too important to forgo for a few keystrokes. turns out `groupWith` uses `sortWith`, which is just a shorter way to key in `sortBy . comparing`. – Will Ness Apr 22 '19 at 07:18

How to generate the cartesian join of a list of tuples based on their first value?

1 Answers1