Please allow me to clarify a few details before beginning:
- I've really worked hard to research this problem on my own. I have looked at solutions like this or this but all those solutions assume you have the entire data structure in memory.
- This is for something practical, and is not a homework assignment or job interview question. I've attempted to come up with a solution and have been unable, even though I know it's theoretically possible.
- The specifics: I am using python/Selenium to traverse a Twine file, but I'm showing a generalized pseudocode interface because those details aren't important to solving the problem, only to demonstrate that I need to traverse the path to discover it.
So, I have a Directed Acyclic Graph with about 150 nodes. There is a single start node, and many end nodes.
Imagine I have an interface similar to this to traverse the graph:
class TheGraphToTraverse:
def get_the_first_node(self):
pass
def get_the_list_of_edges_from_the_current_node(self):
"""
Returns list of links/edges that can be followed
"""
def follow_this_edge(self, edge):
"""
Follow an edge link to its destination node
"""
def go_back_one_step(self):
"""
Navigate back one step
"""
def go_back_to_the_start_node(self):
"""
Start over from the beginning node
"""
def is_this_node_an_end_node(self):
"""
Is the current node an end node with no further edges to traverse?
"""
How would I adapt a (recursive or non-recursive) algorhithm like these but without needing to pass the graph itself into the recursive function:
data = {1 : [2,3], # Directed acyclic graph adjacency list
2 : [3],
3 : [4,5],
4 : [5]}
def dfs(data, path, paths = []):
datum = path[-1]
if datum in data:
for val in data[datum]:
new_path = path + [val]
paths = dfs(data, new_path, paths)
else:
paths += [path]
return paths
See how this algorithm (and all I can find online) has you passing the entire data structure in? I don't have the graph as a data structure and need to discover it during traversal, so I need to be able to keep some kind of state on my current pathway, and I'm having trouble seeing clearly how to store that state reliably.
I've gotten this far:
list_of_all_paths = []
g = TheGraphToTraverse()
first_node = g.get_the_first_node()
list_of_all_paths = dfs(g, first_node, list_of_all_paths)
def dfs(g: TheGraphToTraverse, path, all_paths):
# check for end state
if g.is_this_node_an_end_node:
all_paths += [path]
g.go_back_one_step()
# I recognize I'm not sure how to store
# the previous edges I've stored for this path
all_paths = dfs(g, path, all_paths)
else:
edges = g.get_the_list_of_edges_from_the_current_node()
# how do I know which of these I've already tried?
# how do I know if this is a list of
# edges I've already tried?
for edge in edges:
g.follow_this_edge(edge)
all_paths = dfs(g, path, all_paths)
return all_paths
Any help or adapted algorithm, even if it's not in Python, or is not recursive is fine. Thanks so much for engaging with this.