I want to group tuples based on similar attributes

Question

I have a list of tuples. [ (1, 2), (2, 3), (4, 3), (5, 6), (6, 7), (8, 2) ]

I want to group them into lists based on which tuples are connected (have related values).

So the end result is two lists of related tuple values = [ [1, 2, 3, 4, 8], [5, 6, 7] ]

How can I write a function to do this? This was a job interview question. I was trying to do it in Python, but I'm frustrated and just want to see the logic behind the answer, so even psuedo code would help me, so I can see what I did wrong.

I only had a few minutes to do this on the spot, but here is what I tried:

def find_partitions(connections):
 theBigList = []     # List of Lists
 list1 = []          # The initial list to store lists
 theBigList.append(list1)

 for list in theBigList:
 list.append(connection[1[0], 1[1]])
     for i in connections:
         if i[0] in list or i[1] in list:
             list.append(i[0], i[1])

         else:
             newList = []
             theBigList.append(newList)

Essentially, the guy wanted an list of lists of values that were related. I tried to use a for loop, but realized that it wouldn't work, and then time ran out.

What have you tired? Stack Overflow is here to help you with an exact problem, not do it for you. — Gareth Latty, Dec 10 '12 at 20:37
What result would you expect if the input list was: `[ (1, 2), (4, 3), (2, 3), (5, 6), (6, 7), (8, 2) ]` -- same thing? — mgilson, Dec 10 '12 at 20:37
You could have a look at the source about how `networkx` does it: `import networkx as nx; g = nx.Graph([ (1, 2), (2, 3), (4, 3), (5, 6), (6, 7), (8, 2) ]); nx.connected_components(g) [[8, 1, 2, 3, 4], [5, 6, 7]]` — Jon Clements, Dec 10 '12 at 20:43
Sounds like a problem of Connectivity (http://en.wikipedia.org/wiki/Connectivity_%28graph_theory%29). Each number is a Node in the graph, the tuple pairs are edges, and you wish to categorize nodes by "islands" of connected nodes. The wiki article has some pseudocode under "Computational aspects" that may help you. — Kevin, Dec 10 '12 at 20:45
This is a special case of what's known as "set consolidation". See [this page](http://rosettacode.org/wiki/Set_consolidation#Python) for examples in multiple languages, Python included. — DSM, Dec 10 '12 at 21:03
And [this earlier question](http://stackoverflow.com/questions/9110837/python-simple-list-merging-based-on-intersections) might come in handy. It has lots of options, including timing results. — DSM, Dec 10 '12 at 21:14

Andy Hayden · Accepted Answer · 2012-12-10T22:27:41.947

As we fill in the components, at each stage there are three cases to consider (as you will have to match up overlapping groups):

Neither x or y are in any component already found.
Both are already in different sets, x in set_i and y in set_j.
Either one or both are in one component, x in set_i or y in a set_i.

We can use the built-in set to help. (see @jwpat's and @DSM's trickier examples):

def connected_components(lst):
    components = [] # list of sets
    for (x,y) in lst:
        i = j = set_i = set_j = None
        for k, c in enumerate(components):
            if x in c:
                i, set_i = k, c
            if y in c:
                j, set_j = k, c

        #case1 (or already in same set)
        if i == j:
             if i == None:
                 components.append(set([x,y]))
             continue

        #case2
        if i != None and j != None:
            components = [components[k] for k in range(len(components)) if k!=i and k!=j]
            components.append(set_i | set_j)
            continue

        #case3
        if j != None:
            components[j].add(x)
        if i != None:
            components[i].add(y)

    return components               

lst = [(1, 2), (2, 3), (4, 3), (5, 6), (6, 7), (8, 2)]
connected_components(lst)
# [set([8, 1, 2, 3, 4]), set([5, 6, 7])]
map(list, connected_components(lst))
# [[8, 1, 2, 3, 4], [5, 6, 7]]

connected_components([(1, 2), (4, 3), (2, 3), (5, 6), (6, 7), (8, 2)])
# [set([8, 1, 2, 3, 4]), set([5, 6, 7])] # @jwpat's example

connected_components([[1, 3], [2, 4], [3, 4]]
# [set([1, 2, 3, 4])] # @DSM's example

This certainly won't be the most efficient method, but is perhaps one similar to what they would expect. As Jon Clements points out there is a library for these type of calculations: networkx, where they will be much more efficent.

Note erroneous output from `connected_components([ (1, 2), (4, 3), (2, 3), (5, 6), (6, 7), (8, 2) ])` -- [set([8, 1, 2, 3]), set([3, 4]), set([5, 6, 7])] — James Waldby - jwpat7, Dec 10 '12 at 21:02
I don't think sorting will work: consider `[[1,3],[2,4],[3,4]]`. — DSM, Dec 10 '12 at 21:34
@hayden: doing a set consolidation on the spot is tricky unless you've seen it before, and then it's more a memory test.. so I wouldn't worry. :^) — DSM, Dec 10 '12 at 21:40
@DSM I worried! Now at least I've done the exercise (I think...) — Andy Hayden, Dec 10 '12 at 22:28

score 1 · Answer 2 · answered Dec 10 '12 at 20:57

l = [ (1, 2), (2, 3), (4, 3), (5, 6), (6, 7), (8, 2) ]

# map each value to the corresponding connected component
d = {}
for i, j in l:
  di = d.setdefault(i, {i})
  dj = d.setdefault(j, {j})
  if di is not dj:
    di |= dj
    for k in dj:
      d[k] = di

# print out the connected components
p = set()
for i in d.keys():
  if i not in p:
    print(d[i])
  p |= d[i]

mgilson · Answer 3 · 2012-12-10T20:56:39.573

0

This certainly isn't elegant, but it works:

def _grouper(s,ll):
    for tup in ll[:]:
        if any(x in s for x in tup):
            for y in tup:
                s.add(y)
                ll.remove(tup)

def grouper(ll,out=None):
    _ll = ll[:]
    s = set(ll.pop(0))
    if out is None:
        out = [s]
    else:
        out.append(s)

    l_old = 0
    while l_old != len(_ll):
        l_old = len(_ll)
        _grouper(s,_ll)

    if _ll:
        return grouper(_ll,out=out)
    else:
        return out

ll = [ (1, 2), (2, 3), (4, 3), (5, 6), (6, 7), (8, 2) ]
print grouper(ll)

edited Dec 10 '12 at 20:56

answered Dec 10 '12 at 20:51

mgilson

300,191
65
633
696

@JonClements -- Thanks. for some reason aquamacs is not indenting nicely today ... – mgilson Dec 10 '12 at 20:57

Ashwini Chaudhary · Answer 4 · 2012-12-10T22:04:21.127

0

using sets:

In [235]: def func(ls):
    new_lis=sorted(sorted(ls),key=min) 
    lis=[set(new_lis[0])]
    for x in new_lis[1:]:
            for y in lis:
                    if not set(x).isdisjoint(y):
                            y.update(x);break 
            else:lis.append(set(x))
    return lis
   .....: 

In [236]: func([(3, 1), (9, 3), (6, 9)])
Out[236]: [set([1, 3, 6, 9])]

In [237]: func([[2,1],[3,0],[1,3]])
Out[237]: [set([0, 1, 2, 3])]

In [239]: func([(1, 2), (4, 3), (2, 3), (5, 6), (6, 7), (8, 2)])
Out[239]: [set([8, 1, 2, 3, 4]), set([5, 6, 7])]

edited Dec 10 '12 at 22:04

answered Dec 10 '12 at 20:56

Ashwini Chaudhary

244,495
58
464
504

`func([[2,1],[3,0],[1,3]])` gives `[set([0, 1, 3]), set([1, 2])]`, I think. – DSM Dec 10 '12 at 21:09
@DSM you're right, solution edited. I think this one is correct. – Ashwini Chaudhary Dec 10 '12 at 21:17
`func([[8,5], [5,6], [1,2]])`.. I think you're evolving toward the first solution [here](http://rosettacode.org/wiki/Set_consolidation#Python). – DSM Dec 10 '12 at 21:28
@DSM I don't understand? the following list gives me `[set([1, 2]), set([8, 5, 6])]`, So is my solution correct or incorrect? – Ashwini Chaudhary Dec 10 '12 at 21:45
Oops, indentation error on my part. When I put your `else:` on a level to match the `for y in lis:`, I get the right answer for that once, but `func([(3, 1), (9, 3), (6, 9)])` gives `[set([1, 3, 9]), set([9, 6])]` when it should merge everything into one. – DSM Dec 10 '12 at 21:53
@DSM I believe this time its finally correct, but I hope you might still come up with a good test input and prove it wrong again. :) – Ashwini Chaudhary Dec 10 '12 at 22:05

score 0 · Answer 5 · answered Sep 12 '13 at 04:37

How about

ts = [(1, 2), (2, 3), (4, 3), (5, 6), (6, 7), (8, 2)]
ss = []
while len(ts) > 0:
    s = set(ts.pop())
    ol = 0
    nl = len(s)
    while ol < nl:
        for t in ts:
            if t[0] in s or t[1] in s: s = s.union(ts.pop(ts.index(t)))
        ol = nl
        nl = len(s)
    ss.append(s)

print ss

I want to group tuples based on similar attributes

5 Answers5

Linked

Related