I have a DAG that looks like this: Example DAG
I want to extract all the paths constituted by 4 nodes in this graph.
My expected result should look like this:
N1 -> N2 -> N3 -> N4
N1 -> N2 -> N3 -> N5
N1 -> N3 -> N4 -> N5
N2 -> N3 -> N4 -> N5
My current attempt looks like this
def path_finder(n1):
paths = []
if DAG.has_node(n1):
for n2 in DAG.successors(n1):
for n3 in DAG.successors(n2):
for n4 in DAG.successors(n3):
paths.append([n1, n2, n3, n4])
return paths
I'm calling this function for each node. DAG
is a global variable, more specifically it is a networkx
object (DAG = networkx.DiGraph()
) This naive function is pathetically slow. Is there a more efficient strategy to do this?
I have looked at question 20262712 but was self-solved by the author of the question in rather obscure way.
Thanks
UPDATE:
Since I couldn't get any satisfactory algorithm to solve this, I ended up parallelizing the job using my naive function as a worker while dumping all the data into a queue. I used pool.imap_unordered
to launch worker function and aggregated the results from queue. It still is slow (takes couple of hours for 5M nodes). I should also furnish the data for the average degree of nodes that I'm dealing with, because that will have an effect upon how fast my workers runs. But, I'll leave that out for now.