Split a tuple of tuples (or list of lists) of paired values into independent complete sets?

Question

I have paired values in a csv file. Neither of the paired values are necessarily unique. I would like to split this large list into independent complete sets for further analysis.

To illustrate, my "megalist" is like:

megalist = [['a', 'b'], ['a', 'd'], ['b', 'd'],['b', 'f'], ['r', 's'], ['t', 'r']...]

Most importantly, the output would preserve the list of paired values (i.e., not consolidate the values). Ideally, the output would eventually result in different csv files for individual analysis later. For example, this megalist would be:

completeset1 = [['a', 'b'], ['a', 'd'], ['b', 'd'], ['b', 'f']]
completeset2 = [['r', 's'], ['t', 'r']]
...

In a graph theory context, I'm trying to take a giant graph of mutually exclusive subgraphs (where the paired values are connected vertices) and split them into independent graphs that are more manageable. Thanks for any input!

Edit 1: This put me in a place from which I can move forward. Thanks again!

import sys, csv
import networkx as nx

megalist = csv.reader(open('megalistfile.csv'), delimiter = '\t')

G = nx.Graph()
G.add_edges_from(megalist)

subgraphs = nx.connected_components(G)

output_file = open('subgraphs.txt','w')

for subgraph in subgraphs:
     output_line = str(G.edges(subgraph)) + '\n'
     output_file.write(output_line)

output_file.close()

Sounds like you want a BFS against both elements in the pair. — Ignacio Vazquez-Abrams, Sep 07 '12 at 15:19
This has been answered previously here: http://stackoverflow.com/a/1348995/1267329 — Simeon Visser, Sep 07 '12 at 15:21

score 6 · Accepted Answer · answered Sep 07 '12 at 15:21

You can use networkx for this. Constructing the graph:

>>> import networkx as nx
>>> megalist = [['a', 'b'], ['a', 'd'], ['b', 'd'],['b', 'f'], ['r', 's'], ['t', 'r']]
>>> G = nx.Graph()
>>> G.add_edges_from(megalist)

Then to get the list of subgrahs:

>>> subgraphs = nx.connected_components(G)
>>> subgraphs
[['a', 'b', 'd', 'f'], ['s', 'r', 't']]
>>> [G.edges(subgraph) for subgraph in subgraphs]
[[('a', 'b'), ('a', 'd'), ('b', 'd'), ('b', 'f')], [('s', 'r'), ('r', 't')]]

score 0 · Answer 2 · answered Sep 07 '12 at 16:04

very simple algo with Counter http://docs.python.org/library/collections.html#collections.Counter

from collections import Counter

megalist = [['a', 'b'], ['a', 'd'], ['b', 'd'],['b', 'f'], ['r', 's'], ['t', 'r']]

result = []
for l in megalist:
    cl = Counter(l)
    if not result:
        result.append([l])
    else:
        add = False
        for result_item in result:
            add = bool(filter(lambda e: bool(cl & Counter(e)) , result_item))

            if add and l not in result_item:
                result_item.append(l)
                break                    

        if not add:
            result.append([l])


print result

[[['a', 'b'], ['a', 'd'], ['b', 'd'], ['b', 'f']], [['r', 's'], ['t', 'r']]]

score -2 · Answer 3 · answered Sep 07 '12 at 15:21

You could manually define your sublists using slicing:

completeset1=megalist[0:4]
completeset2=megalist[4:]

However, it really sounds like you'd like to apply some deeper logic, or use additional data, to create these segments automatically according to some condition. it's hard to advice without knowing more about what logic you'd like to apply.

Edit: the comments to the question may be good pointers.

Split a tuple of tuples (or list of lists) of paired values into independent complete sets?

3 Answers3