Join or merge function in haskell

Question

Is there a function in Haskell to do the equivalent of an SQL join or an R merge ?

Basically I have two list of tuples and would like to 'zip' them according to their key. I know there is only one or zero value for each key

a = [(1, "hello"), (2, "world")]
b = [(3, "foo"), (1, "bar")]

And get something like

 [ (Just (1, "hello), Just (1, "bar))
 , (Just (2, "world), Nothing)
 , (Nothing         , Just (3, "foo"))
 ]

Types could be either `[a] -> [b] -> [(Maybe a), (Maybe b)]` or `[a] -> [b] -> [([a], (b)]` to manage collision. — mb14, Jun 26 '14 at 07:21
I'd expect `listJoinBy :: (a -> b -> Bool) -> [a] -> [b] -> [(a,b)]` or `listJoin :: (Eq c) => (a -> c) -> (b -> c) -> [a] -> [b] -> [(a,b)]`. The latter would be useful with record syntax and can be implemented in terms of the former. — Franky, Jun 26 '14 at 07:34
@Franky with only `Eq` restriction the minimum complexity is `O(n^2)`, **when possible**, is better restrict to `Ord` (with complexity `O(n log n)`) — josejuan, Jun 26 '14 at 07:45
this is better represented with `Data.These` since `(Nothing, Nothing)` is impossible. Related thread: http://stackoverflow.com/questions/18210765/instance-alternative-ziplist-in-haskell — Will Ness, Jun 26 '14 at 08:59

josejuan · Answer 1 · 2014-06-26T13:46:21.873

With ordered sets (lists) and Ord key

key = fst

fullOuterJoin xs [] = map (\x -> (Just x, Nothing)) xs
fullOuterJoin [] ys = map (\y -> (Nothing, Just y)) ys
fullOuterJoin xss@(x:xs) yss@(y:ys) =
  if key x == key y
    then (Just x, Just y): fullOuterJoin xs ys
    else
      if key x < key y
        then (Just x, Nothing): fullOuterJoin xs yss
        else (Nothing, Just y): fullOuterJoin xss ys

(complexity is O(n+m) but if you must to sort then is O(n log n + m log m))

example

setA = [(1, "hello"), (2, "world")]
setB = [(1, "bar"), (3, "foo")]

*Main> fullOuterJoin setA setB
[(Just (1,"hello"),Just (1,"bar")),(Just (2,"world"),Nothing),(Nothing,Just (3, "foo"))]

(obviously with sort support

fullOuterJoin' xs ys = fullOuterJoin (sort xs) (sort ys)

As @Franky say, you can avoid if, for example

fullOuterJoin xs [] = [(Just  x, Nothing) | x <- xs]
fullOuterJoin [] ys = [(Nothing, Just  y) | y <- ys]
fullOuterJoin xss@(x:xs) yss@(y:ys) =
  case (compare `on` key) x y of
    EQ -> (Just  x, Just  y): fullOuterJoin xs  ys
    LT -> (Just  x, Nothing): fullOuterJoin xs  yss
    GT -> (Nothing, Just  y): fullOuterJoin xss ys

Use `case compare (key x) (key y)` for greater symmetry if you dislike nested `if` like me. — Franky, Jun 26 '14 at 08:21

score 1 · Accepted Answer · answered Jun 26 '14 at 07:30

1

I can not think of any standard function doing this operation. I would convert the two lists into Data.Map.Maps and code the SQL-join myself. In this way, it looks doable with O(n log n) complexity, which is not too bad.

answered Jun 26 '14 at 07:30

chi

111,837
3
133
218

score 1 · Answer 3 · answered Jun 26 '14 at 07:50

1

If you care about performance, this is not the answer you are looking for. No answer for a built-in function since you didn't give a type.

It can be done by a simple list comprehension

joinOnFst as bs = [(a,b) | a<-as, b<-bs, fst a == fst b]

or with pattern matching and a different return type

joinOnFst as bs = [(a1,a2,b2) | (a1,a2)<-as, (b1,b2)<-bs, a1==b1]

More abstract, you can define

listJoinBy :: (a -> b -> Bool) -> [a] -> [b] -> [(a,b)]
listJoinBy comp as bs =  [(a,b) | a<-as, b<-bs, comp a b]

listJoin :: (Eq c) => (a -> c) -> (b -> c) -> [a] -> [b] -> [(a,b)]
listJoin fa fb = listJoinBy (\a b -> fa a == fb b)

I bet the last line can be made point-free or at least the lambda can be eliminated.

answered Jun 26 '14 at 07:50

Franky

2,339
2
18
27

What would be the answer for a build-in function if I was giving a type ? I'm open to any answer. – mb14 Jun 26 '14 at 08:10
The answer would be: Use hoogle! – Franky Jun 26 '14 at 09:54
this is O(n^2) because of `Eq` but `Int`s are `Ord` too. – Will Ness Jun 26 '14 at 10:26
I don't think, there is a nice pointfree representation. The best, I've come with, is `listJoin = (listJoinBy .) . (((uncurry (==) .) .) .) . (curry .) . (***)` – effectfully Jun 26 '14 at 16:20

score 1 · Answer 4 · answered Apr 19 '19 at 22:35

You're asking if there's a way to something like a SQL join in Haskell.

Rosetta code has an example in Haskell of how to do a hash join, which is the algorithm lots of RDBMS's use for JOINs, one of which is presented as cleaner (though it is slower):

a cleaner and more functional solution would be to use Data.Map (based on binary trees):

import qualified Data.Map as M
import Data.List
import Data.Maybe
import Control.Applicative

mapJoin xs fx ys fy = joined
  where yMap = foldl' f M.empty ys
        f m y = M.insertWith (++) (fy y) [y] m
        joined = concat .
                 mapMaybe (\x -> map (x,) <$> M.lookup (fx x) yMap) $ xs

main = mapM_ print $ mapJoin
    [(1, "Jonah"), (2, "Alan"), (3, "Glory"), (4, "Popeye")]
        snd
    [("Jonah", "Whales"), ("Jonah", "Spiders"), 
     ("Alan", "Ghosts"), ("Alan", "Zombies"), ("Glory", "Buffy")]
        fst

Will Ness · Answer 5 · 2015-11-30T12:10:15.097

What you asked is basically

ordzip a@(x:t) b@(y:r) = case compare x y of
    LT -> (Just  x, Nothing) : ordzip t b 
    EQ -> (Just  x, Just  y) : ordzip t r 
    GT -> (Nothing, Just  y) : ordzip a r 
ordzip a [] = [(Just x, Nothing) | x <- a]
ordzip [] b = [(Nothing, Just y) | y <- b]

With it, we can further define e.g.

import Control.Applicative (<|>) 

diff  xs ys = [x | (Just x, Nothing) <- ordzip xs ys]    -- set difference
meet  xs ys = [y | (Just _, Just  y) <- ordzip xs ys]    -- intersection
union xs ys = [z | (u,v) <- ordzip xs ys, let Just z = u <|> v]

or have few variations concerning the access to keys or handling of duplicates etc. (like defining ordzipBy k a@(x:t) b@(y:r) = case compare (k x) (k y) of ...).

But this is better represented with Data.These.These type to explicate the fact that (Nothing, Nothing) can never happen:

import Data.These

ordzip a@(x:t) b@(y:r) = case compare x y of
    LT -> This  x   : ordzip t b 
    GT -> That    y : ordzip a r 
    _  -> These x y : ordzip t r 

diff  xs ys = catThis                $ ordzip xs ys
meet  xs ys = map snd . catThese     $ ordzip xs ys -- or map fst, or id
union xs ys = map (mergeThese const) $ ordzip xs ys

Of course the input lists are to be sorted by the key beforehand.

Join or merge function in haskell

5 Answers5

Linked