I have a list of sets called groups
. groups[i]
is a set of labels (integers) that are part of the same group as i
. For example,
# idx: 0 1 2 3 4 5 6 7
groups = [{0}, {1}, {2,5,6}, {3}, {4,7}, {2,5,6}, {2,5,6}, {4,7}]
indicates that
0
belongs to the group containing just0
1
belongs to the group containing just1
2
,5
, and6
each belong to the same group, namely the group containing2
,5
and6
3
belongs to the group containing just1
4
and7
each belong to the same group, namely the group containing4
and7
Sometimes it turns out that two groups are the same. For example, say we found out that 4
and 5
belong to the same group. Then we have to merge group {2,5,6}
with group {4,7}
. After this merge, groups
should look like:
# idx: 0 1 2 3 4 5 6 7
groups = [{0}, {1}, {2,4,5,6,7}, {3}, {2,4,5,6,7}, {2,4,5,6,7}, {2,4,5,6,7}, {2,4,5,6,7}]
Naively, this would require updating groups[2]
, groups[4]
, groups[5]
, groups[6]
and groups[7]
or, in general, when merging groupA
with groupB
, it would require len(groupA) + len(groupB)
updates.
A more efficient approach, would be if instead of having multiple copies of the same set in groups
, we had a reference to the same set
# idx: 0 1 2 3 4 5 6 7
groups = [{0}, {1}, groupA, {3}, groupB, groupA, groupA, groupB]
where groupA == {2,5,6}
and groupB == {4,7}
Then, to merge groupA
and groupB
would require at most min(len(groupA), len(groupB))
updates:
if len(groupA) < len(groupB):
groupA, groupB = groupB, groupA
# len(groupA) >= len(groupB)
groupA |= groupB
for label in groupB:
groups[label] = groupA
which would result in
# idx: 0 1 2 3 4 5 6 7
groups = [{0}, {1}, groupA, {3}, groupA, groupA, groupA, groupA]
where groupA == {2,4,5,6,7}
Is there a more efficient way to merge groups? Is Union-Find (Disjoin Set Union) data structure the way?