What was written [in the question's initial version, which was later edited] sounds like you want the transitive closure, but actually you need a symmetric closure as well, so that you can merge (10,22)
, (10,20)
, and (10,69)
.
def merge_sets(sets):
for i in range(len(sets) - 1, 0, -1):
for j in range(i - 1, -1, -1):
if sets[i] & sets[j]: # check for intersection (non-empty -> True)
sets[j] |= sets[i] # merge i-th set into j-th set
sets.pop(i) # remove i-th set
break # terminate inner loop and continue with next i
return sets
tuples = [(10,22), (10,20), (10,69), (34,18), (18,17), \
(89,990), (86,80), (174,175), (543,542)]
sets = [set(t) for t in tuples]
merged_sets = merge_sets(sets)
merged_tuples = [tuple(s) for s in merged_sets]
# merged_tuples:
# [(20, 69, 22, 10), (17, 34, 18), (89, 990), (80, 86), (174, 175), (542, 543)]
This code is straightforward. It first converts the tuples to sets, so that we can conveniently check for shared elements (set intersection). At the end, we convert the sets back to tuples.
The function merge_sets
compares all sets to each other. Whenever two sets intersect, we merge them and continue.
Three points deserve further explanation:
- We can safely merge the i-th set into the j-th set because we already compared the former to all the sets in between. (If we did it the other way round, we would have to compare sets we already partially compared: there could be sets in the middle that intersect with the j-th set but not with the i-th set. For a minimal example, try the input
[(1, 2), (3, 4), (2, 3)]
(left-to-right traversal) or [(2, 3), (3, 4), (1, 2)]
(right-to-left traversal, as in my code).) Because we are merging sets in the same direction as the one in which the loops progress (here: right to left; the choice is algorithmically arbitrary, but see the next bullet point), we can simply continue with the loops and don't have to restart them at an earlier point after every merger.
- Both loops count down because removing Python
list
elements is faster from the right than from the left. (To get fast removals from the left, we could instead use collections.deque
.)
- The reason why this problem is best tackled with traditional loop constructs (as opposed to fancier Python constructs, such as list comprehensions or
map
) is that we are dealing with mutable objects (we will modify both the list as well as the containing items). Even with traditional loops, we need to be careful to modify items in the right way (by merging them in the direction of the loop progression; see first bullet point).
Improved solution:
Because all tuples might be distinct in the "worst" case, there seems to be no way around comparing all of them (or their corresponding sets) to each other for intersecting elements, leading to n*(n+1)/2 comparisons. However, we can speed things up slightly by reducing the number of repeated partial comparisons. For example, if set i is merged into set j, but there is another set m in the middle that intersects with set i, we will, upon reaching set m, have to compare set m with the new set j. But due to the way the loops above work, we have already compared those elements from set j to set m if they originally came from set i.
To avoid such partial duplication, we simply move j to the outer loop (still with right-to-left traversal) and then mark the sets to the right of set j which intersect with it. Only afterwards, we then merge them all in one go into set j.
def merge_sets(sets):
for j in range(len(sets) - 2, -1, -1): # loop from len(sets)-2 down to 0
# mark indices from len(sets)-1 down to j+1 for which the sets intersect:
merge_into_j = [i for i in range(len(sets) - 1, j, -1) if sets[j] & sets[i]]
for i in merge_into_j: # merge_into_j contains indices in RTL direction
sets[j] |= sets[i] # merge i-th set into j-th set
sets.pop(i) # remove i-th set
return sets
(Note how the list comprehension might see progressively smaller values of len(sets)
upon each iteration.)
Credit goes to user Crazy Chucky for reminding me that non-empty sets evaluate to True
.