2

I have a (a very large) list of sets, containing pairs of values such as:

SetList = [{1,2},{2,3},{4,5},{5,6},{1,7}]

I'd like to efficiently determine the sets of values that are disjoint as implied by the transitivity of the relationships in the pairs above. For example, 1 is associated with 2, and 2 with 3 and so 1,2,3 are associated. Similarly 1 is associated with 7, so 1,2,3, and 7 are associated. In the above, 4, 5, and 6 are associated, but not with the remaining values. The result should look like the following:

DisjointSets = [{1,2,3,7},{4,5,6}]

Is there are simple, and efficient, way to perform this operation that I'm missing? Thank you!

DrTRD
  • 1,641
  • 1
  • 13
  • 18

2 Answers2

5

Converting my original list to tuples:

TupleList = [(1,2),(2,3),(4,5),(5,6),(1,7)]

I used networkx via (thanks @user2357112):

import networkx as nx
G = nx.path_graph(0)
G.add_edges_from(TupleList)
DisjointSets = list(nx.connected_components(G))

Is this the most efficient way to solve the problem? Any other ideas?

DrTRD
  • 1,641
  • 1
  • 13
  • 18
  • 2
    Once networkx is installed, this is about efficient in your time as possible. If nx.connected_components is written in C (or similar), it will take less machine time than anything you are likely to write in Python. – Terry Jan Reedy May 15 '17 at 20:04
  • 1
    If you're curious, you can see what networkx is doing with connected_components function at ../networkx/algorithms/components/connected.py, which is a basic breadth first search using generators. Something to keep in mind is that the time complexity is O(|V| + |E|), where V and E are vertices and edges, and O(|E|) can approach O(|V|^2) if your graph is densely connected. If your graph is large enough, you could try parallelizing it by adapting this networkx tutorial: https://blog.dominodatalab.com/social-network-analysis-with-networkx/ – unsupervised_learner May 15 '17 at 21:10
0

The graph approach is likely faster than recursion, but for those interested in pure Python:

def get_disjoints(lst):
    """Return disjoints."""
    def rec_disjoints(lst):
        if not lst:
            return disjoints
        else:
            chosen = lst[0]
            # Iterat/Mutate list trick using indicies
            for i, s in reversed(list(enumerate(lst[:]))):
                if not chosen.isdisjoint(s):
                    chosen.update(s)
                    del lst[i]
        disjoints.append(chosen)
        return rec_disjoints(lst)

    disjoints = []
    return rec_disjoints(lst)

lst = [{1,2}, {2,3}, {4,5}, {5,6}, {1,7}]
get_disjoints(lst)
# [{1, 2, 3, 7}, {4, 5, 6}]

This takes advantage of the helpful isdisjoint method for sets. Although, iteration + function calls + recursion will reduce performance.

Here are tests for robustness, applicable for other contributors:

import nose.tools as nt

def test_disjoint(f):
    "Verify the test function generates expected disjoints."
    def verify(lst1, lst2):
        actual, expected = lst1, lst2
        nt.eq_(actual, expected)

    verify(f([{1,2}, {2,3}, {4,5}, {5,6}, {1,7}]),
             [{1,2,3,7}, {4,5,6}])
    verify(f([{4,5}, {5,6}, {1,7}]),
             [{4,5,6}, {1,7}])
    verify(f([{1,7}]),
             [{1,7}])
    verify(f([{1,2}, {2,3}, {4,5}, {5,6}, {1,7}, {10, 11}]),
             [{1,2,3,7}, {4,5,6}, {10,11}])
    verify(f([{4,5}, {5,6}, {1,7}, {10, 11}]),
             [{4,5,6}, {1,7}, {10,11}])
    verify(f([{1,2}, {4,5}, {6,7}]),
             [{1,2}, {4,5}, {6,7}])


test_disjoint(f=get_disjoints)
pylang
  • 40,867
  • 14
  • 129
  • 121
  • I tested your approach versus the networkx approach that I outlined and it seems that networkx is generally much faster -- up to a factor of 100 in my testing. The approaches only seemed to have comparable timing when the number of disjoint sets approaches the number of unique elements. – DrTRD May 15 '17 at 20:43
  • 1
    No doubt a graph is faster than a recursive approach. Though I thought to include it so as to solve the main problem with pure python, since graphs are another concept to digest entirely. – pylang May 15 '17 at 20:46