How to delete all related nodes in a directed graph using networkx?

Question

I'm not sure exactly sure what the correct terminology is for my question so I'll just explain what I want to do. I have a directed graph and after I delete a node I want all independently related nodes to be removed as well.

Here's an example:

enter image description here

Say, I delete node '11', I want node '2' to be deleted as well(and in my own example, they'll be nodes under 2 that will now have to be deleted as well) because its not connected to the main graph anymore. Note, that node '9' or '10' should not be deleted because node '8' and '3' connect to them still.

I'm using the python library networkx. I searched the documentation but I'm not sure of the terminology so I'm not sure what this is called. If possible, I would want to use a function provided by the library than create my own recursion through the graph(as my graph is quite large).

Any help or suggestions on how to do this would be great.

Thanks!

It does not contain any cycles. Each node is basically a list of daily observations that are being connected to the next days observations. Sometimes, I find faulty obervsations so when I delete it I want the other nodes that were dervived from that deleted node to be deleted as well. — Lostsoul, Jan 08 '12 at 20:21

templatetypedef · Accepted Answer · 2012-01-11T21:08:37.997

I am assuming that the following are true:

The graph is acyclic. You mentioned this in your comment, but I'd like to make explicit that this is a starting assumption.
There is a known set of root nodes. We need to have some way of knowing what nodes are always considered reachable, and I assume that (somehow) this information is known.
The initial graph does not contain any superfluous nodes. That is, if the initial graph contains any nodes that should be deleted, they've already been deleted. This allows the algorithm to work by maintaining the invariant that every node should be there.

If this is the case, then given an initial setup, the only reason that a node is in the graph would be either

The node is in the root reachable set of nodes, or
The node has a parent that is in the root reachable set of nodes.

Consequently, any time you delete a node from the graph, the only nodes that might need to be deleted are that node's descendants. If the node that you remove is in the root set, you may need to prune a lot of the graph, and if the node that you remove is a descendant node with few of its own descendants, then you might need to do very little.

Given this setup, once a node is deleted, you would need to scan all of that node's children to see if any of them have no other parents that would keep them in the graph. Since we assume that the only nodes in the graph are nodes that need to be there, if the child of a deleted node has at least one other parent, then it should still be in the graph. Otherwise, that node needs to be removed. One way to do the deletion step, therefore, would be the following recursive algorithm:

For each of children of the node to delete:
- If that node has exactly one parent: (it must be the node that we're about to delete)
  - Recursively remove that node from the graph.
Delete the specified node from the graph.

This is probably not a good algorithm to implement directly, though, since the recursion involved might get pretty deep if you have a large graph. Thus you might want to implement it using a worklist algorithm like this one:

Create a worklist W.
Add v, the node to delete, to W.
While W is not empty:
- Remove the first entry from W; call it w.
- For each of w's children:
  - If that child has just one parent, add it to W.
- Remove w from the graph.

This ends up being worst-case O(m) time, where m is the number of edges in the graph, since in theory every edge would have to be scanned. However, it could be much faster, assuming that your graph has some redundancies in it.

Hope this helps!

Amazing answer, Thank you it helped me understand alot of what is going on. Question for you: Does it make a major difference if I don't know if the initial graph has superfluous nodes? As I'm taking observations on a process that has been going on for many many years, I'm starting to capture observations and hoping to clean it up as errors are noticed. — Lostsoul, Jan 08 '12 at 20:58
@Lostsoul- If you're unsure whether the initial graph has superfluous nodes, you can always run a graph search to determine what nodes are unnecessary. For example, you can run a depth-first search from each of the nodes you know are valid (the "root set"), marking each node that you encounter. You can then remove all nodes from the graph that aren't marked. This is actually quite efficient (if you use a depth-first search, it takes time proportional to the number of nodes and edges in the graph), and will set you up for this later algorithm. — templatetypedef, Jan 08 '12 at 21:07
@Lostsoul- If you have any future questions on graphs or graph algorithms, feel free to ask! — templatetypedef, Jan 08 '12 at 21:07
@templateypedef Thank you very much! I think I understand what your saying but I'll experiment on code so hopefully I see it in action. Thanks for your help! On a side note, when I first started getting into algo's, I asked a question if brute force algos can scale, and you answered it for me. You not only taught me the idea behind dynamic programming but got me very very interested in algos(in fact, I have your answer printed and refer to it often). You've also helped me recently with graphs, so I can't thank you enough for helping me lay a good foundation! Thank you so much, You rock! — Lostsoul, Jan 08 '12 at 21:18

score 5 · Answer 2 · answered Jan 09 '12 at 15:35

Let me provide you with the python networkX code that solves your task:

import networkx as nx
import matplotlib.pyplot as plt#for the purpose of drawing the graphs
DG=nx.DiGraph()
DG.add_edges_from([(3,8),(3,10),(5,11),(7,11),(7,8),(11,2),(11,9),(11,10),(8,9)])
DG.remove_node(11)

connected_components method surprisingly doesn't work on the directed graphs, so we turn the graph to undirected, find out not connected nodes and then delete them from the directed graph

UG=DG.to_undirected()
not_connected_nodes=[]
for component in nx.connected_components(UG):
    if len(component)==1:#if it's not connected, there's only one node inside
        not_connected_nodes.append(component[0])
for node in not_connected_nodes:
    DG.remove_node(node)#delete non-connected nodes

If you want to see the result, add to the script the following two lines:

nx.draw(DG)
plt.show()

How to delete all related nodes in a directed graph using networkx?

2 Answers2

Linked