Standard way to zip two lists using custom comparator functions

Question

If often need to zip two lists, discarding non matching elements (where "matching" is definied by comparing parts of the elements of the lists). For example:

let as = [(1,"a"), (2,"b"), (4,"c")]
let bs = [(2,"a"), (3,"b"), (4,"c"), (5, "d")]
zipWithAdjust (fst) (fst) as bs
-- ~> [((2,"b"),(2,"a")), ((4,"c"),(4,"c"))]

I implemented zipWithAdjust as follows:

zipWithAdjust :: (Ord c, Show a, Show b, Show c) => (a -> c) -> (b -> c) -> [a] -> [b] -> [(a,b)]
zipWithAdjust cmpValA cmpValB (a:as) (b:bs)
  | cmpValA a == cmpValB b = (a,b) : zipWithAdjust cmpValA cmpValB as bs
  | cmpValA a > cmpValB b = zipWithAdjust cmpValA cmpValB (a:as) bs
  | cmpValA a < cmpValB b = zipWithAdjust cmpValA cmpValB as (b:bs)
zipWithAdjust _ _ _ _ = []

It works fine, but I have a feeling that there is a standard way to do such zip. I found Data.Align and this SO Question but can't figure out how to use it for my use case.

Is there a standard way to do this (using library functions)? Is it Data.Align? If so, how do I implement the above function using Data.Align?

Edit: Changed the < case to get implementation matching with example.

Are you sure you need the `>` case? If you do, it's not possible to express this function in terms of `zip`, because `zip` doesn't allow putting elements back. If you only need to drop some non-matching pairs out, there's a simple solution in terms of `zipWith` and `catMaybes` from [`Data.Maybe`](http://hackage.haskell.org/package/base-4.6.0.1/docs/Data-Maybe.html). — Sergei Lebedev, Feb 10 '14 at 14:01
Yes I need the discarding of elements. Otherwise it would be a simple `zip`, correct. — Sven Koschnicke, Feb 10 '14 at 14:08

score 5 · Answer 1 · answered Feb 10 '14 at 14:04

As far as I know, there's no such function. However, you could make your function more general by using (a -> b -> Ordering) instead of two additional functions:

zipWithAdjust :: (a -> b -> Ordering) -> [a] -> [b] -> [(a,b)]
zipWithAdjust cmp (a:as) (b:bs)
  | ord == LT = zipWithAdjust cmp as (b:bs)
  | ord == GT = zipWithAdjust cmp (a:as) (bs)
  | ord == EQ = (a,b) : zipWithAdjust cmp as bs
  where ord = cmp a b

zipWithAdjust _ _ _ = []

result =  zipWithAdjust (\x y -> compare (fst x) (fst y)) [(1,"a"), (2,"b"), (4,"c")] [(2,"a"), (3,"b"), (4,"c"), (5, "d")]

However, I wouldn't call this zip anymore, but something like compareMerge or similar.

score 5 · Answer 2 · answered Feb 10 '14 at 18:11

You might like Data.Map's intersection capabilities. It's slightly less capable in some ways and more capable in others. For example:

> let as = fromList [(1,"a"), (2,"b"), (4,"c")]; bs = fromList [(2,"a"), (3,"b"), (4,"c"), (5, "d")]
> intersectionWith (,) as bs
fromList [(2,("b","a")),(4,("c","c"))]

bheklilr · Answer 3 · 2014-02-10T14:24:09.720

I would say it's a bit more idiomatic to say

zipWithAdjust cmpA cmpB (a:as) (b:bs) =
    case cmpA a `compare` cmpB b of
        EQ -> (a, b) : zipWithAdjust cmpA cmpB    as     bs
        GT ->          zipWithAdjust cmpA cmpB (a:as)    bs
        LT ->          zipWithAdjust cmpA cmpB    as  (b:bs)
zipWithAdjust _ _ _ _ = []

It would certainly be faster, since it reduces the number of times you have to calculate cmpA a and cmpB b. This isn't truly a zip since you are filtering at the same time, and also offsetting in your GT and LT cases. I would say that this solution is perfectly fine as it is, there isn't a need to implement it using standard functions.

score 3 · Accepted Answer · edited May 23 '17 at 10:25

edit: using These a b type from Data.These (used by Data.Align), with this:

ordzipBy :: (Ord t) => (a -> t) -> (b -> t) -> [a] -> [b] -> [These a b]
ordzipBy f g a@(x:t) b@(y:r) = case compare (f x) (g y) of
    LT -> This  x   : ordzipBy f g t b
    GT -> That    y : ordzipBy f g a r
    EQ -> These x y : ordzipBy f g t r
ordzipBy _ _ a       []      = map This a
ordzipBy _ _ []      b       = map That b

we can express three set operations as:

diffBy :: (Ord t) => (a -> t) -> (b -> t)  -> [a] -> [b] -> [a]
meetBy :: (Ord t) => (a -> t) -> (b -> t)  -> [a] -> [b] -> [(a, b)]
joinBy :: (Ord t) => (a -> t) -> (a->a->a) -> [a] -> [a] -> [a]

diffBy f g xs ys = [x     | This  x   <- ordzipBy f g xs ys]
meetBy f g xs ys = [(x,y) | These x y <- ordzipBy f g xs ys]
joinBy f h xs ys = mergeThese h   `map`  ordzipBy f f xs ys

what you describe is meetBy, i.e. set intersection operation, with the two ordered lists seen as sets.

The ability of a compiler to efficiently compile these definitions is another question though. The three set functions hand-coded along the lines of ordzipBy might run faster.

ordzipBy f g is compatible with align, and [] with nil, but the type machinery involved in making it happen is above my pay grade. :) Also, it's not clear to me whether the law align (f <$> xs) (g <$> ys) = bimap f g <$> align xs ys would make sense at all because mapping the functions f and g can very well change the mutual ordering of elements of xs and ys.

The two problems (the types, and the law) are related: the parts of data recovered by selector functions for ordering purposes serve as positions, as shape, yet are part of the original data. (cf. instance Alternative ZipList in Haskell?).

update: see if the following works as you expected.

{-# LANGUAGE InstanceSigs, DatatypeContexts #-}
import Data.These
import Data.Align

newtype Ord a => ZL a b = ZL {unzl :: [(a,b)]}
      deriving (Eq, Show)

instance Ord a => Functor (ZL a) where
  fmap f (ZL xs) = ZL [(k, f v) | (k,v)<-xs]

instance Ord a => Align (ZL a) where 
  nil = ZL []
  align :: (ZL a b) -> (ZL a c) -> (ZL a (These b c))
  align (ZL a) (ZL b) = ZL (g a b) where
    g a@((k,x):t) b@((n,y):r) = case compare k n of
        LT -> (k, This  x  ) : g t b 
        GT -> (n, That    y) : g a r
        EQ -> (k, These x y) : g t r  
    g a               []              = [(k, This x) | (k,x) <- a]
    g []              b               = [(n, That y) | (n,y) <- b]

diffBy :: (Ord t) => (a -> t) -> (b -> t)  -> [a] -> [b] -> [a]
meetBy :: (Ord t) => (a -> t) -> (b -> t)  -> [a] -> [b] -> [(a, b)]
joinBy :: (Ord t) => (a -> t) -> (a->a->a) -> [a] -> [a] -> [a]

diffBy f g xs ys = catThis        . map snd . unzl 
                    $ align (ZL [(f x,x) | x<-xs]) (ZL [(g y,y) | y<-ys])
meetBy f g xs ys = catThese       . map snd . unzl 
                    $ align (ZL [(f x,x) | x<-xs]) (ZL [(g y,y) | y<-ys])
joinBy f h xs ys = map (mergeThese h . snd) . unzl 
                    $ align (ZL [(f x,x) | x<-xs]) (ZL [(f y,y) | y<-ys])

Infinite lists aren't handled well though, while the hand-coded functions can obviously quite easily be made to handle such cases correctly:

*Main> diffBy id id [1..5] [4..9]
[1,2,3]
*Main> diffBy id id [1..5] [4..]
[1,2,3Interrupted.

*Main> meetBy id id [1,3..10] [2,5..20]
[(5,5)]
*Main> joinBy id const [1,3..10] [2,5..20]
[1,2,3,5,7,8,9,11,14,17,20]

That helps to understand Data.These, thank you! If I get it right, I still have to implement `ordzipBy`, correct? That would be not much gain over my `zipWithAdjust` (or the more general implementation suggested here). — Sven Koschnicke, Feb 11 '14 at 09:40
right, as I said, either define all through `ordzipBy`, or hand code them which usually presents possibilities for improvement - like e.g. `diffBy` which only responds to `This`, so why produce `That` or `These` at all. Unless GHC is "smart" enough. :) -- and thank *you* for bringing `These` up :), I previously used `(Maybe a, Maybe b)` there and `These a b` is better! -- so the problem is the separation of shape and payload; maybe if you'd change your types into explicit pairs `(explicitly_unchangeable_position_index, datum)`, Data.Align would become possible to use. — Will Ness, Feb 11 '14 at 09:45
erhm, `(explicitly_unchangeable_position_indicator-producing_value, datum)`. the selectors would have to be confined to acting only on `fst` parts of such pairs (on type level). — Will Ness, Feb 11 '14 at 09:53
The problems with typing are stemming from the fact that `[]` is a list, but you need a _sorted_ list. — Sassa NF, Feb 11 '14 at 12:35
@SassaNF even if unsorted, it'd just be inconsistent results. I've made the lists explicitly ordered(-able), check it out. :) — Will Ness, Feb 11 '14 at 16:55
Yes, that works, thank you! I added some needed imports to your code. Now I just have to understand it... ;) — Sven Koschnicke, Feb 12 '14 at 10:01
@SvenKoschnicke great. :) (I wouldn't edit my answer anymore as it's getting dangerously close to the 10 edits limit, which will turn it into pumpkin, i.e. community wiki :) ). It's really the same code, just repackaged to fit into the Data.Align mold. And of course `diffBy` is [easy to hand-code so that it stops when the 1st arg is exhausted](https://en.wikipedia.org/wiki/Haskell_98_features#minus). I see no benefits so far in using this type class here. The `These` type itself is a nifty thing though. — Will Ness, Feb 12 '14 at 10:37
@SvenKoschnicke `&&&` just makes pairs, `(f&&&id) x = (f x, x)`, and `ZL` is just a marker (`newtype ... ZL ... = ` can be used instead of `data`). `catThis` etc. just do what my first code did, `[... | This x <- ...]`. — Will Ness, Feb 12 '14 at 10:43
`ZL` marks the lists as ordered, right? I'm surprised that there is no standard type for ordered lists. Or is there? — Sven Koschnicke, Feb 12 '14 at 12:06
Yes, but not lists `:: [a]`, rather `:: [(a,b)]` as ordered according to `:: a` field. -- I don't think so. There's a package, [data-ordlist](http://hackage.haskell.org/package/data-ordlist), to handle ordered (non-decreasing) lists (`:: [a]`). -- I'm making one more edit to make the code a little bit clearer, incorporating your edit as well. — Will Ness, Feb 12 '14 at 14:26

Standard way to zip two lists using custom comparator functions

4 Answers4