Given a list of sets (sets of strings such as setlist = [{'this','is'},{'is','a'},{'test'}
]), the idea is to join pairwise -union- sets that share strings in common. The snippet below takes the literal approach of testing pairwise overlap, joining, and starting anew using an inner loop break.
I know this is the pedestrian approach, and it does take forever for lists of usable size (200K sets of between 2 and 10 strings).
Any advice on how to make this more efficient? Thanks.
j = 0
while True:
if j == len(setlist): # both for loops are done
break # while
for i in range(0,len(setlist)-1):
for j in range(i+1,len(setlist)):
a = setlist[i];
b = setlist[j];
if not set(a).isdisjoint(b): # ... then join them
newset = set.union( a , b ) # ... new set
del setlist[j] # ... drop highest index
del setlist[i] # ... drop lowest index
setlist.insert(0,newset) # ... introduce consolidated set, which messes up i,j
break # ... back to the top for fresh i,j
else:
continue
break