0

I have 2D list in python

list = [[9, 2, 7], [9, 7], [2, 7], [1, 0], [0, 5, 4]]

I would like to get union of list items if there occurs any intersection. For example [9, 2, 7], [9, 7], [2, 7] has intersection of more than one digit. The union of this would be [9,2,7].

How can i get the final list as follows in efficient way ?

finalList = [[9,2,7], [0, 1, 5, 4]]

N.B. order of numbers is not important.

World
  • 2,007
  • 7
  • 27
  • 37

6 Answers6

5

Here is a theoretical answer: This is a connected component problem: you build a graph as follows:

  • there is a vertex for each set is the list
  • there is an edge between two sets when they have a common value.

what you want is the union of the connected components of the graph.

hivert
  • 10,579
  • 3
  • 31
  • 56
  • Here is a question before giving a practical answer: how fast do you want this to be ? – hivert Jul 23 '13 at 07:41
  • It seems more useful to me to define the graph in such a way that the vertices are elements of the sublists, and two vertices have an edge between them if they are members of the same sublist. Just finding the edges of the graph with your current definition is expensive. – user2357112 Jul 23 '13 at 07:49
2

You have a graph problem. You want to build connected components in a graph whose vertices are elements of your sublists, and where two vertices have an edge between them if they're elements of the same sublist. You could build an adjacency-list representation of your input and run a graph search algorithm over it, or you could iterate over your input and build disjoint sets. Here's a slightly-modified connected components algorithm I wrote up for a similar question:

import collections

# build an adjacency list representation of your input
graph = collections.defaultdict(set)
for l in input_list:
    if l:
        first = l[0]
        for element in l:
            graph[first].add(element)
            graph[element].add(first)

# breadth-first search the graph to produce the output
output = []
marked = set() # a set of all nodes whose connected component is known
for node in graph:
    if node not in marked:
        # this node is not in any previously seen connected component
        # run a breadth-first search to determine its connected component
        frontier = set([node])
        connected_component = []
        while frontier:
            marked |= frontier
            connected_component.extend(frontier)

            # find all unmarked nodes directly connected to frontier nodes
            # they will form the new frontier
            new_frontier = set()
            for node in frontier:
                new_frontier |= graph[node] - marked
            frontier = new_frontier
        output.append(tuple(connected_component))
Community
  • 1
  • 1
user2357112
  • 260,549
  • 28
  • 431
  • 505
  • 1
    I noticed the code doesn't return a sublist if it consists of only 1 item and is not a member of any other sublist. `[[1, 2], [3]]` returns `[(1, 2)]`. Is it supposed to be like that; are we only interested in the connected components? – RussW Jul 23 '13 at 16:22
  • @RussW: Whoops. The code was originally written for a question where all the sublists were length 2; I forgot to handle the length-1 case when I modified it for variable-length inputs. – user2357112 Jul 23 '13 at 16:33
  • Still brilliant though. – RussW Jul 23 '13 at 16:49
1

Here is an answer without any imports:

def func(L):
    r = []
    cur = set()
    for l in L:
        if not cur:
            cur = set(l)
        if any(i in cur for i in l):
            cur.update(l)
        else:
            r.append(cur)
            cur = set(l)
    r.append(cur)
    while len(r)>1:
        if any(i in r[0] for i in r[-1]):
            r[-1].update(r.pop(0))
        else:
            break
    return r

Using it:

>>> func([[9, 2, 7], [9, 7], [2, 7], [1, 0], [0, 5, 4]])
[set([9, 2, 7]), set([0, 1, 4, 5])]
>>> func([[0],[1],[2],[0,1]])
[set([2]), set([0, 1])]

You can remove the set and return a list of lists by changing r.append(cur) into r.append(list(cur)), but I think it is neater to return sets.

Inbar Rose
  • 41,843
  • 24
  • 85
  • 131
  • Doesn't work. `func([[0],[1],[2],[0,1]]) == [set([0]), set([1]), set([2]), set([0, 1])]`. – user2357112 Jul 23 '13 at 08:08
  • Why is that not a correct result? From what I understand only intersections between neighboring lists are grouped? – Inbar Rose Jul 23 '13 at 08:09
  • I fixed the issue, I didn't initially realize that the lists were supposed to cycle... This is not the most efficient solution, but the logic is simple. – Inbar Rose Jul 23 '13 at 08:41
0

This one uses sets:

>>> l = [[9, 2, 7], [9, 7], [2, 7], [1, 0], [0, 5, 4]]
>>> done = []
>>> while len(done) != len(l):
    start = min([i for i in range(len(l)) if i not in done])
    ref = set(l[start])
    for j in [i for i in range(len(l)) if i not in done]:
        if set(l[j]) & ref:
            done.append(j)
            ref |= set(l[j])
    print ref


set([2, 7, 9])
set([0, 1, 4, 5])
Emmanuel
  • 13,935
  • 12
  • 50
  • 72
0

I propose that you examine each pair of list with itertools

import itertools, numpy

ls_tmp_rmv = []

while True:
    ls_tmp = []

    for s, k in itertools.combinations(lisst, 2):
        if len(set(s).intersection( set(k) )) > 0:

            ls_tmp = ls_tmp + [numpy.unique(s + k).tolist()]

            if [s] not in ls_tmp:
                ls_tmp_rmv = ls_tmp_rmv + [s]
            if [k] not in ls_tmp:
                ls_tmp_rmv = ls_tmp_rmv + [k]
        else:
            ls_tmp = ls_tmp + [s] + [k]

    ls_tmp = [ls_tmp[i] for i in range(len(ls_tmp)) if ls_tmp[i] 
                    not in ls_tmp[i+1:]]
    ls_tmp_rmv = [ls_tmp_rmv[i] for i in range(len(ls_tmp_rmv)) 
                     if ls_tmp_rmv[i] not in ls_tmp_rmv[i+1:]]

    ls_tmp = [X for X in ls_tmp if X not in ls_tmp_rmv]

    if ls_tmp == lisst :
        break
    else:
        lisst = ls_tmp

print lisst

You take all combinations of all pairs of lists in your list and check whether there are elements in common. If so, you merge the pair. If not, you add both peers in the pair. You keep in mind the elements you merged to remove them from the resulting list in the end.

With the list

lisst = [[1,2], [2,3], [8,9], [3,4]]

you do get

[[1, 2, 3, 4], [8, 9]]
kiriloff
  • 25,609
  • 37
  • 148
  • 229
0
def intersection_groups(lst):
    lst = map(set, lst)
    a, b = 0, 1
    while a < len(lst) - 1:
        while b < len(lst):
            if not lst[a].isdisjoint(lst[b]):
                lst[a].update(lst.pop(b))
            else:
                b += 1
        a, b = a + 1, a + 2
    return lst
RussW
  • 427
  • 4
  • 11