Merge multiple lists if condition is true

Question

I've been trying to wrap my head around this for a while now, but it seems like my lack of Haskell experience just won't get me through it. I couldn't find a similar question here on Stackoverflow (most of them are related to merging all sublists, without any condition)

So here it goes. Let's say I have a list of lists like this:

[[1, 2, 3], [3, 5, 6], [20, 21, 22]]

Is there an efficient way to merge lists if some sort of condition is true? Let's say I need to merge lists that share at least one element. In case of example, result would be:

[[1, 2, 3, 3, 5, 6], [20, 21, 22]]

Another example (when all lists can be merged):

[[1, 2], [2, 3], [3, 4]]

And it's result:

[[1, 2, 2, 3, 3, 4]]

Thanks for your help!

One wild guess: This sounds like your implementing some kind of disjoint-union data structure. Lists are not a good way to represent this. Can you tell us exactly what you are trying to achieve here? — Niklas B., May 09 '13 at 14:13
@NiklasB The example provided was pretty much what I was trying to do - no particular use. I'll make sure to do some research on disjoint-union. So I guess there is no half-decent way to do it with lists? — Alvydas, May 09 '13 at 14:56
@Alvydas: I don't think so, merging lists is a costly (linear) operation, so the naive algorithm is going to be quadratic or worse. Maybe that's okay for your use case, though. — Niklas B., May 09 '13 at 15:05
If you absolutely have to do it with lists, the general approach is to take the first list, iterate through the list appending every list with a common element and repeat until no more lists are added (since you could have `[[1,2],[3,4],[1,4]]`, where the first pass does not pick up everything). Recurse over the sublists that did not join with the first. But I would look for a better data structure... — isturdy, May 09 '13 at 15:13

score 4 · Accepted Answer · answered May 09 '13 at 15:48

I don't know what to say about efficiency, but we can break down what's going on and get several different functionalities at least. Particular functionalities might be optimizable, but it's important to clarify exactly what's needed.

Let me rephrase the question: For some set X, some binary relation R, and some binary operation +, produce a set Q = {x+y | x in X, y in X, xRy}. So for your example, we might have X being some set of lists, R being "xRy if and only if there's at least one element in both x and y", and + being ++.

A naive implementation might just copy the set-builder notation itself

shareElement :: Eq a => [a] -> [a] -> Bool
shareElement xs ys = or [x == y | x <- xs, y <- ys]

v1 :: (a -> a -> Bool) -> (a -> a -> b) -> [a] -> [b]
v1 (?) (<>) xs = [x <> y | x <- xs, y <- xs, x ? y]

then p = v1 shareElement (++) :: Eq a => [[a]] -> [[a]] might achieve what you want. Except it probably doesn't.

Prelude> p [[1], [1]]
[[1,1],[1,1],[1,1],[1,1]]

The most obvious problem is that we get four copies: two from merging the lists with themselves, two from merging the lists with each other "in both directions". The problem occurs because List isn't the same as Set so we can't kill uniques. Of course, that's an easy fix, we'll just use Set everywhere

import Data.Set as Set

v2 :: (a -> a -> Bool) -> (a -> a -> b) -> Set.Set a -> Set.Set b
v2 (?) (<>) = Set.fromList . v1 (?) (<>) . Set.toList

So we can try again, p = v2 (shareElementonSet.toList) Set.union with

Prelude Set> p $ Set.fromList $ map Set.fromList [[1,2], [2,1]]
fromList [fromList [1,2]]

which seems to work. Note that we have to "go through" List because Set can't be made an instance of Monad or Applicative due to its Ord constraint.

I'd also note that there's a lot of lost behavior in Set. For instance, we fight either throwing away order information in the list or having to handle both x <> y and y <> x when our relation is symmetric.

Some more convenient versions can be written like

v3 :: Monoid a => (a -> a -> Bool) -> [a] -> [a]
v3 r = v2 r mappend

and more efficient ones can be built if we assume that the relationship is, say, an equality relation since then instead of having an O(n^2) operation we can do it in O(nd) where d is the number of partitions (cosets) of the relation.

Generally, it's a really interesting problem.

score 2 · Answer 2 · edited May 23 '17 at 12:03

I just happened to write something similar here: Finding blocks in arrays

You can just modify it so (although I'm not too sure about the efficiency):

import Data.List (delete, intersect) 

example1 = [[1, 2, 3], [3, 5, 6], [20, 21, 22]]
example2 = [[1, 2], [2, 3], [3, 4]]

objects zs = map concat . solve zs $ [] where
  areConnected x y = not . null . intersect x $ y
  solve []     result = result
  solve (x:xs) result =
    let result' = solve' xs [x]
    in solve (foldr delete xs result') (result':result) where
  solve' xs result =
    let ys = filter (\y -> any (areConnected y) result) xs
    in if null ys 
          then result
          else solve' (foldr delete xs ys) (ys ++ result)

OUTPUT:

*Main> objects example1
[[20,21,22],[3,5,6,1,2,3]]

*Main> objects example2
[[3,4,2,3,1,2]]

Merge multiple lists if condition is true

2 Answers2