2

Is there a function in Haskell to do the equivalent of an SQL join or an R merge ?

Basically I have two list of tuples and would like to 'zip' them according to their key. I know there is only one or zero value for each key

a = [(1, "hello"), (2, "world")]
b = [(3, "foo"), (1, "bar")]

And get something like

 [ (Just (1, "hello), Just (1, "bar))
 , (Just (2, "world), Nothing)
 , (Nothing         , Just (3, "foo"))
 ]
amalloy
  • 89,153
  • 8
  • 140
  • 205
mb14
  • 22,276
  • 7
  • 60
  • 102
  • 1
    Can you write down the `type` of the output you want ? – Sibi Jun 26 '14 at 07:11
  • Types could be either `[a] -> [b] -> [(Maybe a), (Maybe b)]` or `[a] -> [b] -> [([a], (b)]` to manage collision. – mb14 Jun 26 '14 at 07:21
  • I'd expect `listJoinBy :: (a -> b -> Bool) -> [a] -> [b] -> [(a,b)]` or `listJoin :: (Eq c) => (a -> c) -> (b -> c) -> [a] -> [b] -> [(a,b)]`. The latter would be useful with record syntax and can be implemented in terms of the former. – Franky Jun 26 '14 at 07:34
  • @Franky with only `Eq` restriction the minimum complexity is `O(n^2)`, **when possible**, is better restrict to `Ord` (with complexity `O(n log n)`) – josejuan Jun 26 '14 at 07:45
  • 2
    this is better represented with `Data.These` since `(Nothing, Nothing)` is impossible. Related thread: http://stackoverflow.com/questions/18210765/instance-alternative-ziplist-in-haskell – Will Ness Jun 26 '14 at 08:59

5 Answers5

3

With ordered sets (lists) and Ord key

key = fst

fullOuterJoin xs [] = map (\x -> (Just x, Nothing)) xs
fullOuterJoin [] ys = map (\y -> (Nothing, Just y)) ys
fullOuterJoin xss@(x:xs) yss@(y:ys) =
  if key x == key y
    then (Just x, Just y): fullOuterJoin xs ys
    else
      if key x < key y
        then (Just x, Nothing): fullOuterJoin xs yss
        else (Nothing, Just y): fullOuterJoin xss ys

(complexity is O(n+m) but if you must to sort then is O(n log n + m log m))

example

setA = [(1, "hello"), (2, "world")]
setB = [(1, "bar"), (3, "foo")]

*Main> fullOuterJoin setA setB
[(Just (1,"hello"),Just (1,"bar")),(Just (2,"world"),Nothing),(Nothing,Just (3, "foo"))]

(obviously with sort support

fullOuterJoin' xs ys = fullOuterJoin (sort xs) (sort ys)

As @Franky say, you can avoid if, for example

fullOuterJoin xs [] = [(Just  x, Nothing) | x <- xs]
fullOuterJoin [] ys = [(Nothing, Just  y) | y <- ys]
fullOuterJoin xss@(x:xs) yss@(y:ys) =
  case (compare `on` key) x y of
    EQ -> (Just  x, Just  y): fullOuterJoin xs  ys
    LT -> (Just  x, Nothing): fullOuterJoin xs  yss
    GT -> (Nothing, Just  y): fullOuterJoin xss ys
josejuan
  • 9,338
  • 24
  • 31
  • 1
    Use `case compare (key x) (key y)` for greater symmetry if you dislike nested `if` like me. – Franky Jun 26 '14 at 08:21
1

I can not think of any standard function doing this operation. I would convert the two lists into Data.Map.Maps and code the SQL-join myself. In this way, it looks doable with O(n log n) complexity, which is not too bad.

chi
  • 111,837
  • 3
  • 133
  • 218
1

If you care about performance, this is not the answer you are looking for. No answer for a built-in function since you didn't give a type.

It can be done by a simple list comprehension

joinOnFst as bs = [(a,b) | a<-as, b<-bs, fst a == fst b]

or with pattern matching and a different return type

joinOnFst as bs = [(a1,a2,b2) | (a1,a2)<-as, (b1,b2)<-bs, a1==b1]

More abstract, you can define

listJoinBy :: (a -> b -> Bool) -> [a] -> [b] -> [(a,b)]
listJoinBy comp as bs =  [(a,b) | a<-as, b<-bs, comp a b]

listJoin :: (Eq c) => (a -> c) -> (b -> c) -> [a] -> [b] -> [(a,b)]
listJoin fa fb = listJoinBy (\a b -> fa a == fb b)

I bet the last line can be made point-free or at least the lambda can be eliminated.

Franky
  • 2,339
  • 2
  • 18
  • 27
1

You're asking if there's a way to something like a SQL join in Haskell.

Rosetta code has an example in Haskell of how to do a hash join, which is the algorithm lots of RDBMS's use for JOINs, one of which is presented as cleaner (though it is slower):

a cleaner and more functional solution would be to use Data.Map (based on binary trees):

import qualified Data.Map as M
import Data.List
import Data.Maybe
import Control.Applicative

mapJoin xs fx ys fy = joined
  where yMap = foldl' f M.empty ys
        f m y = M.insertWith (++) (fy y) [y] m
        joined = concat .
                 mapMaybe (\x -> map (x,) <$> M.lookup (fx x) yMap) $ xs

main = mapM_ print $ mapJoin
    [(1, "Jonah"), (2, "Alan"), (3, "Glory"), (4, "Popeye")]
        snd
    [("Jonah", "Whales"), ("Jonah", "Spiders"), 
     ("Alan", "Ghosts"), ("Alan", "Zombies"), ("Glory", "Buffy")]
        fst
djhaskin987
  • 9,741
  • 4
  • 50
  • 86
0

What you asked is basically

ordzip a@(x:t) b@(y:r) = case compare x y of
    LT -> (Just  x, Nothing) : ordzip t b 
    EQ -> (Just  x, Just  y) : ordzip t r 
    GT -> (Nothing, Just  y) : ordzip a r 
ordzip a [] = [(Just x, Nothing) | x <- a]
ordzip [] b = [(Nothing, Just y) | y <- b]

With it, we can further define e.g.

import Control.Applicative (<|>) 

diff  xs ys = [x | (Just x, Nothing) <- ordzip xs ys]    -- set difference
meet  xs ys = [y | (Just _, Just  y) <- ordzip xs ys]    -- intersection
union xs ys = [z | (u,v) <- ordzip xs ys, let Just z = u <|> v] 

or have few variations concerning the access to keys or handling of duplicates etc. (like defining ordzipBy k a@(x:t) b@(y:r) = case compare (k x) (k y) of ...).

But this is better represented with Data.These.These type to explicate the fact that (Nothing, Nothing) can never happen:

import Data.These

ordzip a@(x:t) b@(y:r) = case compare x y of
    LT -> This  x   : ordzip t b 
    GT -> That    y : ordzip a r 
    _  -> These x y : ordzip t r 

diff  xs ys = catThis                $ ordzip xs ys
meet  xs ys = map snd . catThese     $ ordzip xs ys -- or map fst, or id
union xs ys = map (mergeThese const) $ ordzip xs ys 

Of course the input lists are to be sorted by the key beforehand.

Will Ness
  • 70,110
  • 9
  • 98
  • 181