I have a list of lists where each sublist contains some integers:
o = [[1,2],[3,4],[2,3],[5,4]]
I'd like to create a new list of lists in which any two sublists in o
that share a common member will be merged. This merging process should continue until no two sublists share a common element. Given o
, we would merge [1,2]
with [2,3]
because they share a 2, then we'd merge that group with [3,4]
because [1,2,3]
and [3,4]
both contain a 3, and so on.
The expected output of clustering o
would be [[1,2,3,4,5]]
I have a hunch there exists an approach to this task that's far superior to my current approach (see below). Any suggestions others can offer on the most efficient (in time, then space) way to accomplish this task would be greatly appreciated.
from collections import defaultdict
o = [[1,2],[3,4],[2,3],[5,4]]
def group_lists(list_of_lists):
'''
Given a list of lists, continue combining sublist
elements that share an element until no two sublist
items share an element.
'''
to_cluster = set(tuple(i) for i in list_of_lists)
keep_clustering = True
while keep_clustering:
keep_clustering = False
d = defaultdict(set)
for i in to_cluster:
for j in i:
d[j].add(i)
clustered = set()
for i in d.values():
# remove duplicate entries from each cluster
flat = tuple(set([item for sublist in i for item in sublist]))
clustered.add(flat)
if not to_cluster == clustered:
keep_clustering = True
to_cluster = clustered
# done clustering!
return clustered
print(group_lists(o))