7

I'm new to networkx and could do with some help please.

I have a set of data which I've processed to generate the nodes and edges. There are around 5000 groups of nodes that have more than 2 links within them (up to 10 nodes in the group in total). But the problem is that there are also several thousand pairs of nodes that only have 1 edge between them, i.e node a is linked to node b but neither are linked to any other node.

I want to remove these paired nodes from the chart.

Is there a way to filter these out?

Joel
  • 22,598
  • 6
  • 69
  • 93
A Rob4
  • 1,278
  • 3
  • 17
  • 35
  • Let me see if I understand: You have a graph, `G` and you want to remove all nodes from `G` that are not part of components of 3 or more nodes? – Joel Jul 11 '16 at 14:18
  • Joel - that is a much more elegant way of describing the problem: its exactly that! – A Rob4 Jul 11 '16 at 14:29
  • I've edited the title to make it more useful for other people who may find this through a search engine. Feel free to modify if you don't think I've appropriately changed it. – Joel Jul 11 '16 at 18:22

1 Answers1

12

So our goal is to remove all nodes from components with less than 3 nodes (this includes isolated nodes if they exist).

for component in list(nx.connected_components(G)):
    if len(component)<3:
        for node in component:
            G.remove_node(node)

A small warning is in order when using nx.connected_components. It returns a generator of components. If I didn't put list around it, it would generate one at a time, and then perform the steps for the given component. Once all that is done, it would generate the next component. But because G has been modified, python can't be sure that this behaves well. So it would die (complaining that a dictionary changed size --- the number of nodes in G changed). By turning it into a list, the components are all found before it starts the loop. So the graph won't be changing while the components are being found.

Community
  • 1
  • 1
Joel
  • 22,598
  • 6
  • 69
  • 93