Organize list of tuples into distinct categories

Question

Suppose you have input of the following format:

data = [(1, 5), (7, 2), (3, 4), (4, 8), (6, 3), (5, 2)]

I want to organize these numbers into a separate bucket or list.

If a number from a tuple is found inside another tuple then it means that those numbers should go into the same bucket; otherwise, into a different bucket. For example, from the above example, the numbers will get distributed into two buckets: bucket_a = {1, 5, 2, 7} because of these tuples:

(1, 5)
(5, 2)
(7, 2)

and bucket_b = {3, 4, 6, 8} because of these tuples:

(3, 4)
(4, 8)

As output, do you want lists of tuples or just the buckets as sets? — Olivier Melançon, May 26 '18 at 03:37
Also see https://stackoverflow.com/q/17803919, https://stackoverflow.com/q/9353802, https://stackoverflow.com/q/12063545, https://stackoverflow.com/q/13808774 to name a few. — AGN Gazer, May 26 '18 at 04:17

arshajii · Accepted Answer · 2018-05-26T03:55:41.763

This is basically a connected components problem where data defines the edges of your graph. One way to solve it is using a disjoint-set data structure. Here's an example:

>>> from collections import defaultdict
>>> sets = defaultdict(set)
>>> for a in range(1,9): sets[a].add(a)
...
>>> for a,b in data:
...     s = sets[a] | sets[b]
...     for c in s: sets[c] = s
...
>>> sets
defaultdict(<type 'set'>, {1: set([1, 2, 5, 7]), 2: set([1, 2, 5, 7]), 3: set([8, 3, 4, 6]), 4: set([8, 3, 4, 6]), 5: set([1, 2, 5, 7]), 6: set([8, 3, 4, 6]), 7: set([1, 2, 5, 7]), 8: set([8, 3, 4, 6])})

Note that there are much better disjoint-set implementations than what I've done here, but the idea is the same.

To get a list of unique buckets just get the unique values in sets by id:

>>> seen = set()
>>> for s in sets.values():
...     if id(s) not in seen:
...         print s
...         seen.add(id(s))
...
set([1, 2, 5, 7])
set([8, 3, 4, 6])

score 1 · Answer 2 · answered May 26 '18 at 04:00

If you need to keep track of the buckets and the edges that generated those, this solution uses a dict to keep track of both.

def get_buckets(data):
    buckets = {}
    for x, y in data:
        if x in buckets and y in buckets:
            if buckets[x] is buckets[y]:
                buckets[x].append((x, y))
            else:
                buckets[x].extend(buckets[y])
                buckets[x].append((x, y))
                for a in sum(buckets[y], ()):
                    buckets[a] = buckets[x]
        elif x in buckets:
            buckets[x].append((x, y))
            buckets[y] = buckets[x]
        elif y in buckets:
            buckets[y].append((x, y))
            buckets[x] = buckets[y]
        else:
            buckets[x] = buckets[y] = [(x, y)]

    return {frozenset(sum(bucket, ())): bucket
            for bucket in map(frozenset, buckets.values())}

Example:

data = [(1, 5), (7, 2), (3, 4), (4, 8), (6, 3), (5, 2)]
get_buckets(data)
# output:
# {frozenset({1, 2, 5, 7}): frozenset({(1, 5), (5, 2), (7, 2)}),
#  frozenset({8, 3, 4, 6}): frozenset({(6, 3), (3, 4), (4, 8)})}

score 0 · Answer 3 · answered May 26 '18 at 03:56

It may not be the most optimal, but here is my solution:

In [101]: data = [(1, 5), (7, 2), (3, 4), (4, 8), (6, 3), (5, 2)]

In [102]: def merge(data):
     ...:     ndata = len(data)
     ...:     sets = [set(data[0])]
     ...:     for d in data[1:]:
     ...:         found = False
     ...:         for s in sets:
     ...:             if s.intersection(d):
     ...:                 s.update(d)
     ...:                 found = True
     ...:                 break
     ...:         if not found:
     ...:             sets.append(set(d))
     ...:     if len(sets) == ndata:
     ...:         return sets
     ...:     else:
     ...:         return merge(sets)
     ...:     

In [103]: merge(data)
Out[103]: [{1, 2, 5, 7}, {3, 4, 6, 8}]

Organize list of tuples into distinct categories

3 Answers3