merge lists with intersection

Question

Given that:

g=[[], [], [0, 2], [1, 5], [0, 2, 3, 7], [4, 6], [1, 4, 5, 6], [], [], [3, 7]]

How can I compare each list within g so that for lists sharing anyone common number can merge to a set?

e.g.
0 exists in g[2] and g[4] so they merge to a set {0,2,3,7}

I have tried the following but it doesn't work:

for i in g:
    for j in g:
        if k in i == l in j:
            m=set(i+j)

I want to make the largest possible set.

Welcome to scicomp. As your question is rather python specific I suggest emigrate it to the 'stackoverflow' page... — Jan, Jan 06 '15 at 08:04
Does this answer your question? [Python: simple list merging based on intersections](https://stackoverflow.com/questions/9110837/python-simple-list-merging-based-on-intersections) — Estif, Oct 10 '22 at 20:13

Mazdak · Answer 1 · 2015-06-11T08:02:33.713

As a much faster way You can first create a list of the set of items with len more than one (s) . then go through your list and update in place with union function !

s=map(set,g)
def find_intersection(m_list):
    for i,v in enumerate(m_list) : 
        for j,k in enumerate(m_list[i+1:],i+1):
           if v & k:
              m_list[i]=v.union(m_list.pop(j))
              return find_intersection(m_list)
    return m_list

Demo :

g=[[], [], [0, 2], [1, 5], [0, 2, 3, 7], [4, 6], [1, 4, 5, 6], [], [], [3, 7]]
s=map(set,g)
print find_intersection(s)

[set([0, 2, 3, 7]), set([1, 4, 5, 6])]

g=[[1,2,3],[3,4,5],[5,6],[6,7],[9,10],[10,11]]
s=map(set,g)
print find_intersection(s)

[set([1, 2, 3, 4, 5, 6, 7]), set([9, 10, 11])]

g=[[], [1], [0,2], [1, 5], [0, 2, 3, 7], [4, 6], [1, 4, 5, 6], [], [], [3, 7]]
s=map(set,g)
print find_intersection(s)

[set([1, 4, 5, 6]), set([0, 2, 3, 7])]

Benchmark with @Mark's answer :

from timeit import timeit


s1="""g=[[], [], [0, 2], [1, 5], [0, 2, 3, 7], [4, 6], [1, 4, 5, 6], [], [], [3, 7]]
sets = [set(i+j) for i in g for j in g if i!=j and (set(i) & set(j))]
    """
s2="""g=[[], [], [0, 2], [1, 5], [0, 2, 3, 7], [4, 6], [1, 4, 5, 6], [], [], [3, 7]]

s=map(set,g)

def find_intersection(m_list):
    for i,v in enumerate(m_list) : 
        for j,k in enumerate(m_list[i+1:],i+1):
           if v & k:
              s[i]=v.union(m_list.pop(j))
              return find_intersection(m_list)
    return m_list 
    """

print ' first: ' ,timeit(stmt=s1, number=100000)
print 'second : ',timeit(stmt=s2, number=100000)

first:  3.8284008503
second :  0.213887929916

Your algorithm is right but there is a flaw in its implementation: Try with `g=[[], [1], [0,2], [1, 5], [0, 2, 3, 7], [4, 6], [1, 4, 5, 6], [], [], [3, 7]]` The problem is the use of `s[i:]` which creates a copy of a slice of `s` not getting updated by `s.pop(t)` — Pablo Francisco Pérez Hidalgo, Jan 06 '15 at 18:50

score 1 · Answer 2 · answered Jan 06 '15 at 16:27

1

Here's a quickie that will give a list of all the sets that intersect:

sets = [set(i+j) for i in g for j in g if i!=j and (set(i) & set(j))]

Note that each result will be repeated, since each list is being compared twice, once on the left and once on the right.

answered Jan 06 '15 at 16:27

Mark Ransom

299,747
42
398
622

2

In that case, you might want to do: `set(frozenset(i+j) for i in g for j in g if i!=j and (set(i) & set(j)))`. It will get rid of the duplicates (assuming that is what the OP wants). – Jan 06 '15 at 16:31
Sorry, can I ask what is (set(i) & set(j)? – James Harroid Jan 06 '15 at 16:31
@NaLai `i` and `j` are entries in `g`. They are lists. `set(i)` and `set(j)` are the set versions of those lists. `set(i) & set(j)` is the intersection of `set(i)` and `set(j)`. Correspondingly, `|` would be intersection, and `^` would be symmetric difference. – senshin Jan 06 '15 at 16:33
@NaLai see http://stackoverflow.com/questions/642763/python-intersection-of-two-lists. If the intersection of the sets is empty the `if` will evaluate false, otherwise it will be true. – Mark Ransom Jan 06 '15 at 16:53
@iCodez How can I extract the set in frozenset without the word 'frozenset' in it? – James Harroid Jan 06 '15 at 17:34
@NaLai - The word "frozenset" is only in the string representation. You could convert the frozensets into normal sets with `set()`, but I doubt that is necessary (unless you plan to add items to the sets). – Jan 06 '15 at 17:37

Pablo Francisco Pérez Hidalgo · Answer 3 · 2015-01-09T18:55:28.287

If g or g's elements are huge you can use Disjoint Sets to boost efficiency.

This data structure can be used to classify each element into the set it should belong to.

The first step is to build a Disjoint Set collection with all g sets labeled by their index in g:

g=[[], [], [0, 2], [1, 5], [0, 2, 3, 7], [4, 6], [1, 4, 5, 6], [], [], [3, 7],[99]]
g = map(set, g)
dss = CDisjointSets()
for i in xrange(len(g)):
    dss.MakeSet(i)

Then, the sets gets joined whenever the intersection is not empty:

for i in xrange(len(g)):
    for j in xrange(i+1, len(g)):
        if g[i].intersection(g[j]):
            dss.Join(i,j)

At this point dss gives you a common label for sets of g that should joined together:

print(dss)

parent(0) = 0 parent(1) = 1 parent(2) = 2 parent(3) = 3 parent(4) = 2 parent(5) = 3 parent(6) = 3 parent(7) = 7 parent(8) = 8 parent(9) = 2 parent(10) = 10

Now you just have to build the new sets joining those which have the same label:

l2set = dict()
for i in xrange(len(g)):
    label = dss.FindLabel(i).getLabel()
    l2set[label] = l2set.get(label, set()).union(g[i])
print(l2set)

Resulting in:

{0: set([]), 1: set([]), 2: set([0, 2, 3, 7]), 3: set([1, 4, 5, 6]), 7: set([]), 8:   set([]), 10: set([99])}

This is the implementation of Disjoint Sets I used but surely you can find anotherone with a better sintax:

""" Disjoint Sets
    -------------
    Pablo Francisco Pérez Hidalgo
    December,2012. """
class CDisjointSets:

    #Class to represent each set
    class DSet:
        def __init__(self, label_value):
            self.__label = label_value
            self.rank = 1
            self.parent = self
        def getLabel(self):
            return self.__label

    #CDisjointSets Private attributes
    __sets = None

    #CDisjointSets Constructors and public methods.
    def __init__(self):
        self.__sets = {}

    def MakeSet(self, label):
        if label in self.__sets: #This check slows the operation a lot,
            return False         #it should be removed if it is sure that
                                 #two sets with the same label are not goind
                                 #to be created.
        self.__sets[label] = self.DSet(label)

    #Pre: 'labelA' and 'labelB' are labels or existing disjoint sets.
    def Join(self, labelA, labelB):
        a = self.__sets[labelA]
        b = self.__sets[labelB]
        pa = self.Find(a)
        pb = self.Find(b)
        if pa == pb: 
            return #They are already joined
        parent = pa
        child = pb
        if pa.rank < pb.rank:
            parent = pb
            child = pa
        child.parent = parent
        parent.rank = max(parent.rank, child.rank+1)

    def Find(self,x):
        if x == x.parent:
            return x
        x.parent = self.Find(x.parent)
        return x.parent

    def FindLabel(self, label):
        return self.Find(self.__sets[label])

    def __str__(self):
        ret = ""
        for e in self.__sets:
            ret = ret + "parent("+self.__sets[e].getLabel().__str__()+") = "+self.FindLabel(e).parent.getLabel().__str__() + "\n"
        return ret

merge lists with intersection

3 Answers3

Linked