1

I have a long list of tuples, could look like this for example:

[('5','9'), ('10','11'), ('1','2'), ('1','3'), ('1','4'), ('2','7'), ('3','8'), ('2','1'), ('3','1'), ('3','4'), ('5','6'),  ('5','10'),  ('10','12'), ('11','13'), ('13','14')]

I need to combine them into lists if they share anything in common. So the output in the example would be:

['11', '10', '13', '12', '14', '5', '6', '9']
['1', '3', '2', '4', '7', '8']

To clarify: the input is a list of tuples. I need to combine all tuples with shared element into one list. So if I will have: [('1','2'), ('1','3'), ('1','4'), ('4','5')], all the elements should all be put into one list ['1', '2', '3', '4', '5'], because they are linked through the tuples.

I tried to come up with something going through dictionaries byt failed miserably. I am sure there some "easier" solution.

thank you Martin

mkol
  • 13
  • 4

2 Answers2

0

Looks like you are doing union-find operations. https://en.wikipedia.org/wiki/Disjoint-set_data_structure

Start with creating singleton disjoint sets, each having a number occurring in your list. Next, for each tuple, union the sets corresponding to the numbers in the tuple.

The above solution will have running time nearly linear in the number of tuples. See A set union find algorithm for possible union-find implementations.

Community
  • 1
  • 1
r.v
  • 4,697
  • 6
  • 35
  • 57
0

We can frame this problem as that of finding connected components in an undirected graph. All distinct numbers that appear in the list can be treated as the nodes (vertices) of the graph, and the pairs as its edges.

Here is a simple algorithm with inline comments:

l = [('5','9'), ('10','11'), ('1','2'), ('1','3'), ('1','4'), ('2','7'), ('3','8'), ('2','1'), ('3','1'), ('3','4'), ('5','6'),  ('5','10'),  ('10','12'), ('11','13'), ('13','14')]

# get all unique elements ("nodes") of `l'
nodes = set().union(*map(set, l))

# start with each node in its own connected component
comp = {node:{node} for node in nodes}

# now repeatedly merge pairs of components connected by edges in `l'
while True:
  merged = False
  new_l = []  # will drop edges that have already been used in a merge
  for n1, n2 in l:
    if comp[n1] is not comp[n2]:
      # the two connected components are not the same, so merge them
      new_comp = comp[n1] | comp[n2]
      for n in new_comp:
        comp[n] = new_comp
      merged = True
    else:
      # keep the pair for the next iteration
      new_l.append((n1, n2))
  if not merged:
    # all done
    break
  l = new_l

# now print all distinct connected components
for c in set(map(frozenset, comp.values())):
  print list(c)

This prints out:

['1', '3', '2', '4', '7', '8']
['11', '10', '13', '12', '14', '5', '6', '9']
NPE
  • 486,780
  • 108
  • 951
  • 1,012