2

If often need to zip two lists, discarding non matching elements (where "matching" is definied by comparing parts of the elements of the lists). For example:

let as = [(1,"a"), (2,"b"), (4,"c")]
let bs = [(2,"a"), (3,"b"), (4,"c"), (5, "d")]
zipWithAdjust (fst) (fst) as bs
-- ~> [((2,"b"),(2,"a")), ((4,"c"),(4,"c"))]

I implemented zipWithAdjust as follows:

zipWithAdjust :: (Ord c, Show a, Show b, Show c) => (a -> c) -> (b -> c) -> [a] -> [b] -> [(a,b)]
zipWithAdjust cmpValA cmpValB (a:as) (b:bs)
  | cmpValA a == cmpValB b = (a,b) : zipWithAdjust cmpValA cmpValB as bs
  | cmpValA a > cmpValB b = zipWithAdjust cmpValA cmpValB (a:as) bs
  | cmpValA a < cmpValB b = zipWithAdjust cmpValA cmpValB as (b:bs)
zipWithAdjust _ _ _ _ = []

It works fine, but I have a feeling that there is a standard way to do such zip. I found Data.Align and this SO Question but can't figure out how to use it for my use case.

Is there a standard way to do this (using library functions)? Is it Data.Align? If so, how do I implement the above function using Data.Align?

Edit: Changed the < case to get implementation matching with example.

Community
  • 1
  • 1
Sven Koschnicke
  • 6,523
  • 2
  • 34
  • 49
  • 1
    Are you sure you need the `>` case? If you do, it's not possible to express this function in terms of `zip`, because `zip` doesn't allow putting elements back. If you only need to drop some non-matching pairs out, there's a simple solution in terms of `zipWith` and `catMaybes` from [`Data.Maybe`](http://hackage.haskell.org/package/base-4.6.0.1/docs/Data-Maybe.html). – Sergei Lebedev Feb 10 '14 at 14:01
  • Yes I need the discarding of elements. Otherwise it would be a simple `zip`, correct. – Sven Koschnicke Feb 10 '14 at 14:08

4 Answers4

5

As far as I know, there's no such function. However, you could make your function more general by using (a -> b -> Ordering) instead of two additional functions:

zipWithAdjust :: (a -> b -> Ordering) -> [a] -> [b] -> [(a,b)]
zipWithAdjust cmp (a:as) (b:bs)
  | ord == LT = zipWithAdjust cmp as (b:bs)
  | ord == GT = zipWithAdjust cmp (a:as) (bs)
  | ord == EQ = (a,b) : zipWithAdjust cmp as bs
  where ord = cmp a b

zipWithAdjust _ _ _ = []

result =  zipWithAdjust (\x y -> compare (fst x) (fst y)) [(1,"a"), (2,"b"), (4,"c")] [(2,"a"), (3,"b"), (4,"c"), (5, "d")]

However, I wouldn't call this zip anymore, but something like compareMerge or similar.

Zeta
  • 103,620
  • 13
  • 194
  • 236
5

You might like Data.Map's intersection capabilities. It's slightly less capable in some ways and more capable in others. For example:

> let as = fromList [(1,"a"), (2,"b"), (4,"c")]; bs = fromList [(2,"a"), (3,"b"), (4,"c"), (5, "d")]
> intersectionWith (,) as bs
fromList [(2,("b","a")),(4,("c","c"))]
Daniel Wagner
  • 145,880
  • 9
  • 220
  • 380
4

I would say it's a bit more idiomatic to say

zipWithAdjust cmpA cmpB (a:as) (b:bs) =
    case cmpA a `compare` cmpB b of
        EQ -> (a, b) : zipWithAdjust cmpA cmpB    as     bs
        GT ->          zipWithAdjust cmpA cmpB (a:as)    bs
        LT ->          zipWithAdjust cmpA cmpB    as  (b:bs)
zipWithAdjust _ _ _ _ = []

It would certainly be faster, since it reduces the number of times you have to calculate cmpA a and cmpB b. This isn't truly a zip since you are filtering at the same time, and also offsetting in your GT and LT cases. I would say that this solution is perfectly fine as it is, there isn't a need to implement it using standard functions.

bheklilr
  • 53,530
  • 6
  • 107
  • 163
3

edit: using These a b type from Data.These (used by Data.Align), with this:

ordzipBy :: (Ord t) => (a -> t) -> (b -> t) -> [a] -> [b] -> [These a b]
ordzipBy f g a@(x:t) b@(y:r) = case compare (f x) (g y) of
    LT -> This  x   : ordzipBy f g t b
    GT -> That    y : ordzipBy f g a r
    EQ -> These x y : ordzipBy f g t r
ordzipBy _ _ a       []      = map This a
ordzipBy _ _ []      b       = map That b

we can express three set operations as:

diffBy :: (Ord t) => (a -> t) -> (b -> t)  -> [a] -> [b] -> [a]
meetBy :: (Ord t) => (a -> t) -> (b -> t)  -> [a] -> [b] -> [(a, b)]
joinBy :: (Ord t) => (a -> t) -> (a->a->a) -> [a] -> [a] -> [a]

diffBy f g xs ys = [x     | This  x   <- ordzipBy f g xs ys]
meetBy f g xs ys = [(x,y) | These x y <- ordzipBy f g xs ys]
joinBy f h xs ys = mergeThese h   `map`  ordzipBy f f xs ys

what you describe is meetBy, i.e. set intersection operation, with the two ordered lists seen as sets.

The ability of a compiler to efficiently compile these definitions is another question though. The three set functions hand-coded along the lines of ordzipBy might run faster.

ordzipBy f g is compatible with align, and [] with nil, but the type machinery involved in making it happen is above my pay grade. :) Also, it's not clear to me whether the law align (f <$> xs) (g <$> ys) = bimap f g <$> align xs ys would make sense at all because mapping the functions f and g can very well change the mutual ordering of elements of xs and ys.

The two problems (the types, and the law) are related: the parts of data recovered by selector functions for ordering purposes serve as positions, as shape, yet are part of the original data. (cf. instance Alternative ZipList in Haskell?).


update: see if the following works as you expected.

{-# LANGUAGE InstanceSigs, DatatypeContexts #-}
import Data.These
import Data.Align

newtype Ord a => ZL a b = ZL {unzl :: [(a,b)]}
      deriving (Eq, Show)

instance Ord a => Functor (ZL a) where
  fmap f (ZL xs) = ZL [(k, f v) | (k,v)<-xs]

instance Ord a => Align (ZL a) where 
  nil = ZL []
  align :: (ZL a b) -> (ZL a c) -> (ZL a (These b c))
  align (ZL a) (ZL b) = ZL (g a b) where
    g a@((k,x):t) b@((n,y):r) = case compare k n of
        LT -> (k, This  x  ) : g t b 
        GT -> (n, That    y) : g a r
        EQ -> (k, These x y) : g t r  
    g a               []              = [(k, This x) | (k,x) <- a]
    g []              b               = [(n, That y) | (n,y) <- b]

diffBy :: (Ord t) => (a -> t) -> (b -> t)  -> [a] -> [b] -> [a]
meetBy :: (Ord t) => (a -> t) -> (b -> t)  -> [a] -> [b] -> [(a, b)]
joinBy :: (Ord t) => (a -> t) -> (a->a->a) -> [a] -> [a] -> [a]

diffBy f g xs ys = catThis        . map snd . unzl 
                    $ align (ZL [(f x,x) | x<-xs]) (ZL [(g y,y) | y<-ys])
meetBy f g xs ys = catThese       . map snd . unzl 
                    $ align (ZL [(f x,x) | x<-xs]) (ZL [(g y,y) | y<-ys])
joinBy f h xs ys = map (mergeThese h . snd) . unzl 
                    $ align (ZL [(f x,x) | x<-xs]) (ZL [(f y,y) | y<-ys])

Infinite lists aren't handled well though, while the hand-coded functions can obviously quite easily be made to handle such cases correctly:

*Main> diffBy id id [1..5] [4..9]
[1,2,3]
*Main> diffBy id id [1..5] [4..]
[1,2,3Interrupted.

*Main> meetBy id id [1,3..10] [2,5..20]
[(5,5)]
*Main> joinBy id const [1,3..10] [2,5..20]
[1,2,3,5,7,8,9,11,14,17,20]
Community
  • 1
  • 1
Will Ness
  • 70,110
  • 9
  • 98
  • 181
  • That helps to understand Data.These, thank you! If I get it right, I still have to implement `ordzipBy`, correct? That would be not much gain over my `zipWithAdjust` (or the more general implementation suggested here). – Sven Koschnicke Feb 11 '14 at 09:40
  • right, as I said, either define all through `ordzipBy`, or hand code them which usually presents possibilities for improvement - like e.g. `diffBy` which only responds to `This`, so why produce `That` or `These` at all. Unless GHC is "smart" enough. :) -- and thank *you* for bringing `These` up :), I previously used `(Maybe a, Maybe b)` there and `These a b` is better! -- so the problem is the separation of shape and payload; maybe if you'd change your types into explicit pairs `(explicitly_unchangeable_position_index, datum)`, Data.Align would become possible to use. – Will Ness Feb 11 '14 at 09:45
  • erhm, `(explicitly_unchangeable_position_indicator-producing_value, datum)`. the selectors would have to be confined to acting only on `fst` parts of such pairs (on type level). – Will Ness Feb 11 '14 at 09:53
  • The problems with typing are stemming from the fact that `[]` is a list, but you need a _sorted_ list. – Sassa NF Feb 11 '14 at 12:35
  • @SassaNF even if unsorted, it'd just be inconsistent results. I've made the lists explicitly ordered(-able), check it out. :) – Will Ness Feb 11 '14 at 16:55
  • Yes, that works, thank you! I added some needed imports to your code. Now I just have to understand it... ;) – Sven Koschnicke Feb 12 '14 at 10:01
  • @SvenKoschnicke great. :) (I wouldn't edit my answer anymore as it's getting dangerously close to the 10 edits limit, which will turn it into pumpkin, i.e. community wiki :) ). It's really the same code, just repackaged to fit into the Data.Align mold. And of course `diffBy` is [easy to hand-code so that it stops when the 1st arg is exhausted](https://en.wikipedia.org/wiki/Haskell_98_features#minus). I see no benefits so far in using this type class here. The `These` type itself is a nifty thing though. – Will Ness Feb 12 '14 at 10:37
  • @SvenKoschnicke `&&&` just makes pairs, `(f&&&id) x = (f x, x)`, and `ZL` is just a marker (`newtype ... ZL ... = ` can be used instead of `data`). `catThis` etc. just do what my first code did, `[... | This x <- ...]`. – Will Ness Feb 12 '14 at 10:43
  • `ZL` marks the lists as ordered, right? I'm surprised that there is no standard type for ordered lists. Or is there? – Sven Koschnicke Feb 12 '14 at 12:06
  • Yes, but not lists `:: [a]`, rather `:: [(a,b)]` as ordered according to `:: a` field. -- I don't think so. There's a package, [data-ordlist](http://hackage.haskell.org/package/data-ordlist), to handle ordered (non-decreasing) lists (`:: [a]`). -- I'm making one more edit to make the code a little bit clearer, incorporating your edit as well. – Will Ness Feb 12 '14 at 14:26