I've been successful using the following algorithm to complete all-path data up to path length of 10 on graphs of ~900 nodes. However, I want to scale it up to larger graphs and I'm wondering if there are further optimizations I can do. So far I have:
- After a node has completed it's DFS the paths are saved to a hash table. Should said node be encountered, paths from the hash table are appended so work is not repeated.
- Nodes are sorted by their degree (highest first). This way nodes most likely to be encountered will already be in the hash table.
The algorithm in specifics: builds a network of chemicals (nodes) and reactions (edges) and creates paths so it can later be searched much faster for theoretical paths, these can later be experimentally tested.
The algo is based on the networkx all_simple_paths
algorithm:
def _all_simple_paths_graph(DG, cutoff):
memorizedPaths = {}
nlist = []
#sorts nodes into highest -> lowest degree order
degreelist = sorted(DG.degree_iter(),key=itemgetter(1),reverse=True)
for i in degreelist:
t = re.findall("'\s*([^\"]*)\s*'",str(i))
nlist.extend(t)
with open ('PythonOutput.txt', "wb") as csvfile:
writer = csv.writer(csvfile, delimiter=' ', quotechar='|')
numberdone = 0
#for each node start a new DFS
for source in nlist:
print source
print numberdone
numberdone += 1
uniqueTreePaths = []
if cutoff < 1:
return
visited = [source]
stack = [iter(DG[source])]
while stack:
children = stack[-1]
child = next(children, None)
if child is None:
stack.pop()
visited.pop()
#If a node has been searched before, append its paths
elif child in memorizedPaths:
for path in memorizedPaths[child]:
newPath = (tuple(visited) + tuple(path))
if (len(newPath) <= cutoff) and (len(set(visited) & set(path)) == 0):
uniqueTreePaths.append(newPath)
continue
elif len(visited) < cutoff:
if child not in visited:
visited.append(child)
stack.append(iter(DG[child]))
if visited not in uniqueTreePaths:
uniqueTreePaths.append(tuple(visited))
else: #len(visited) == cutoff:
if (visited not in uniqueTreePaths) and (child not in visited):
uniqueTreePaths.append(tuple(visited + [child]))
stack.pop()
visited.pop()
#for each node, write to disk to save RAM
for path in uniqueTreePaths:
writer.writerow(path)
#add paths for each node to the hash table
memorizedPaths[source] = uniqueTreePaths
If anyone has any suggestions for further optimizing the algorithm, it would be greatly appreciated.