Combining tuples based on transitive relation

Question

I have a list of tuples:

[(10,22), (10,20), (10,69), (34,18), (18,17), (89,990), (86,80), (174,175), (543,542)]

I'd like to obtain a result like this:

[(10,22,20,69), (34,18,17), (89,990), (86, 80), (174,175), (543,542)]

I want to group together any of the tuples that have at least one element in common. How would I do that?

could you give us more clarification, and what you have tried so far? — XMehdi01, Mar 03 '23 at 17:54
the numbers are IDs of objects. I want to group them in groups of n elements. If object X is with object Y but object Y is with Z I want to create the group X,Y,Z etc. while if object Q is with object R I want to maintain this tuple (Q,R) — lorenzlorg, Mar 03 '23 at 17:57
`(10,22,20,69)` doesn't look like a transitive relationship. It just combines all the elements that start with `10`. But `(34, 18, 17)` look like transitive, since it combines `(34, 18)` and `(18, 17)`. — Barmar, Mar 03 '23 at 18:09
my fault! Anyway, I'd like to group them because they share at least one element, in this case 10 — lorenzlorg, Mar 03 '23 at 18:11
more than transitivity it seems more smt like a connected components decomposition — cards, Mar 03 '23 at 21:19

Lover of Structure · Answer 1 · 2023-03-07T23:18:44.683

What was written [in the question's initial version, which was later edited] sounds like you want the transitive closure, but actually you need a symmetric closure as well, so that you can merge (10,22), (10,20), and (10,69).

def merge_sets(sets):
    for i in range(len(sets) - 1, 0, -1):
        for j in range(i - 1, -1, -1):
            if sets[i] & sets[j]: # check for intersection (non-empty -> True)
                sets[j] |= sets[i] # merge i-th set into j-th set
                sets.pop(i) # remove i-th set
                break # terminate inner loop and continue with next i
    return sets

tuples = [(10,22), (10,20), (10,69), (34,18), (18,17), \
          (89,990), (86,80), (174,175), (543,542)]
sets = [set(t) for t in tuples]
merged_sets = merge_sets(sets)
merged_tuples = [tuple(s) for s in merged_sets]
# merged_tuples:
# [(20, 69, 22, 10), (17, 34, 18), (89, 990), (80, 86), (174, 175), (542, 543)]

This code is straightforward. It first converts the tuples to sets, so that we can conveniently check for shared elements (set intersection). At the end, we convert the sets back to tuples.

The function merge_sets compares all sets to each other. Whenever two sets intersect, we merge them and continue.

Three points deserve further explanation:

We can safely merge the i-th set into the j-th set because we already compared the former to all the sets in between. (If we did it the other way round, we would have to compare sets we already partially compared: there could be sets in the middle that intersect with the j-th set but not with the i-th set. For a minimal example, try the input [(1, 2), (3, 4), (2, 3)] (left-to-right traversal) or [(2, 3), (3, 4), (1, 2)] (right-to-left traversal, as in my code).) Because we are merging sets in the same direction as the one in which the loops progress (here: right to left; the choice is algorithmically arbitrary, but see the next bullet point), we can simply continue with the loops and don't have to restart them at an earlier point after every merger.
Both loops count down because removing Python list elements is faster from the right than from the left. (To get fast removals from the left, we could instead use collections.deque.)
The reason why this problem is best tackled with traditional loop constructs (as opposed to fancier Python constructs, such as list comprehensions or map) is that we are dealing with mutable objects (we will modify both the list as well as the containing items). Even with traditional loops, we need to be careful to modify items in the right way (by merging them in the direction of the loop progression; see first bullet point).

Improved solution:

Because all tuples might be distinct in the "worst" case, there seems to be no way around comparing all of them (or their corresponding sets) to each other for intersecting elements, leading to n*(n+1)/2 comparisons. However, we can speed things up slightly by reducing the number of repeated partial comparisons. For example, if set i is merged into set j, but there is another set m in the middle that intersects with set i, we will, upon reaching set m, have to compare set m with the new set j. But due to the way the loops above work, we have already compared those elements from set j to set m if they originally came from set i.

To avoid such partial duplication, we simply move j to the outer loop (still with right-to-left traversal) and then mark the sets to the right of set j which intersect with it. Only afterwards, we then merge them all in one go into set j.

def merge_sets(sets):
    for j in range(len(sets) - 2, -1, -1): # loop from len(sets)-2 down to 0
        # mark indices from len(sets)-1 down to j+1 for which the sets intersect:
        merge_into_j = [i for i in range(len(sets) - 1, j, -1) if sets[j] & sets[i]]
        for i in merge_into_j: # merge_into_j contains indices in RTL direction
            sets[j] |= sets[i] # merge i-th set into j-th set
            sets.pop(i) # remove i-th set
    return sets

(Note how the list comprehension might see progressively smaller values of len(sets) upon each iteration.)

Credit goes to user Crazy Chucky for reminding me that non-empty sets evaluate to True.

Neat. The other reason to pop from the right is that modifying a list under iteration is not going to work quite as nicely in the forward direction — Mad Physicist, Mar 04 '23 at 03:51
@MadPhysicist You are totally right – it's not only better in terms of runtime but also in terms of array index handling in the loop! — Lover of Structure, Mar 04 '23 at 08:12

score 2 · Answer 2 · answered Mar 04 '23 at 21:00

I wonder if you can get a speedup by taking the idea behind Lover of Structure's answer and moving the O(n^2) portion to operate on integers. Here is an attempt. In this version, I don't need to assume that the input is a list of sets:

from collections import defaultdict

tuples = [(10, 22), (10, 20), (10, 69), (34, 18), (18, 17),
          (89, 990), (86, 80), (174, 175), (543, 542)]
values_to_index = defaultdict(int)

# This part is a single pass over the elements
for i, t in enumerate(tuples):
    for v in t:
        values_to_index[v] |= (1 << i)

# Merging the sets is done with integer operations
# Instead of popping elements, it seems simpler to construct
# a new list, since that will be shorter than the input,
# reducing the number of searches
merged_sets = []
for s in values_to_index.values():
    # To avoid unnecessary checks for disjointness, keep track of the
    # elements that merge pre-existing sets and clean up as you go
    merges = []
    for i in range(len(merged_sets)):
        if merged_sets[i] & s:
            merged_sets[i] |= s
            merges.append(i)
    if len(merges) == 0:
        merged_sets.append(s)
    elif len(merges) > 1:
        k = merges[0]
        for i in merges[:0:-1]:
            merged_sets[k] |= merged_sets.pop(i)

# Now we have a list of integers telling us which set each tuple goes to
output = [set() for _ in range(len(merged_sets))]
for output_index, bits in enumerate(merged_sets):
    while bits:
        b = bits & (~bits + 1)
        output[output_index].update(tuples[b.bit_length() - 1])
        bits ^= b
result = [tuple(s) for s in output]

The bit iterator is courtesy of https://stackoverflow.com/a/8898977/2988730.

cards · Answer 3 · 2023-03-05T15:55:22.357

Chain together compatible tuples, i.e. those which share at least an element. Each chain will be a connected component of the original data.

def cc_from_tuples(pairs:list[tuple]) -> list[tuple]:
    """Get Connected Components"""
    ccs = []
    # new instance of the data
    s_pairs = set(pairs) # or pairs.copy()
    # iterative classification of the pairs
    while s_pairs:
        # initialize connected component
        cc = set(s_pairs.pop())
        # temporary set
        s_pairs_tmp = set()
        # classify tuples as either part of the cc or as next candidate cc
        for t in s_pairs:
            if cc.intersection(t):
                cc.update(t)
            else:
                s_pairs_tmp.add(t)
        # add completed connected component
        ccs.append(tuple(cc)) # casting
        # iteration step
        s_pairs = s_pairs_tmp

    return ccs


ts = [(10,22), (10,20), (10,69), (34,18), (18,17), (89,990), (86,80), (174,175), (543,542)]
cc_from_tuples(ts)
#[(17, 18, 34), (10, 20, 69, 22), (542, 543), (89, 990), (80, 86), (174, 175)]

ts = [(1, 0), (2, 3), (1, 4)]
cc_from_tuples(ts)
#[(0, 1, 4), (2, 3)]

ts = [(1, 0), (2, 3), (1, 2)]
cc_from_tuples(ts)
#[(0, 1, 2, 3)]

If result needs to be sorted by decreasing length of tuples modify the return-statement of the function as follow return sorted(ccs, key=len, reverse=True)

Combining tuples based on transitive relation

3 Answers3