4

I have a >10k list of (unordered) pairs of numbers. I'd like to classify them into sets of connected pairs either directly or indirectly. I think this corresponds to undirected graph. I'm using python, and tried something like this to represent this structure.

In order to know all the numbers connected to i, I can examine whether there is a path from i to j for all j in the list except i. However, with this implementation, the computation time gets too long for the size of list I'm dealing with. Is there a more efficient way to do this? (Or is there an already established python libraries?)

Community
  • 1
  • 1
pyrookie
  • 400
  • 3
  • 11

1 Answers1

8

It sounds as though you are interested in computing the connected components of a graph. I would suggest looking into the networkx package and its tools for computing components.

For example, suppose our data is a list of pairs of numbers, each pair representing an edge in the graph:

pairs = [
    (1, 2),
    (2, 4),
    (3, 5),
    (2, 5),
    (7, 9),
    (9, 10),
    (8, 7)
]

In the graph represented by these edges, there is a path between any pair of nodes in the set {1, 2, 3, 4, 5}, and there is also a path between any pair of nodes in {6, 7, 8, 9, 10}. But there is no path, say, from 5 to 7. This is to say that there are two connected components in the graph.

To discover these components, we first import networkx and create a graph:

>>> import networkx as nx
>>> graph = nx.from_edgelist(pairs)

Computing the components is as simple as

>>> list(nx.connected_components(graph))
>>> [{1, 2, 3, 4, 5}, {6, 7, 8, 9, 10}]

nx.connected_components is a generator, and so here we converted the result into a list in order to show all of the connected components.

We can also find the connected component containing a given node:

>>> nx.node_connected_component(graph, 3)
{1, 2, 3, 4, 5}

We can also quickly count the number of connected components:

>>> nx.number_connected_components(graph)
2
jme
  • 19,895
  • 6
  • 41
  • 39
  • Yes, this was what was looking for. Thanks for detailed answer! – pyrookie Nov 01 '16 at 03:38
  • Connected components is also available in scipy: https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.csgraph.connected_components.html. I don't know which implementation is faster, but it may avoid an additional dependency in your projects. – Florent F Oct 29 '19 at 10:41
  • The second link is broken – thanos.a Dec 06 '19 at 20:06