1

I am working with a (number of) directed graphs with no cycles in them, and I have the need to find all simple paths between any two nodes. In general I wouldn't worry about the execution time, but I have to do this for very many nodes during very many timesteps - I am dealing with a time-based simulation.

I had tried in the past the facilities offered by NetworkX but in general I found them slower than my approach. Not sure if anything has changed lately.

I have implemented this recursive function:

import timeit

def all_simple_paths(adjlist, start, end, path):

    path = path + [start]

    if start == end:
        return [path]

    paths = []

    for child in adjlist[start]:

        if child not in path:

            child_paths = all_simple_paths(adjlist, child, end, path)
            paths.extend(child_paths)

    return paths


fid = open('digraph.txt', 'rt')
adjlist = eval(fid.read().strip())

number = 1000
stmnt  = 'all_simple_paths(adjlist, 166, 180, [])'
setup  = 'from __main__ import all_simple_paths, adjlist'
elapsed = timeit.timeit(stmnt, setup=setup, number=number)/number
print 'Elapsed: %0.2f ms'%(1000*elapsed)

On my computer, I get an average of 1.5 ms per iteration. I know this is a small number, but I have to do this operation very many times.

In case you're interested, I have uploaded a small file containing the adjacency list here:

adjlist

I am using adjacency lists as inputs, coming from a NetworkX DiGraph representation.

Any suggestion for improvements of the algorithm (i.e., does it have to be recursive?) or other approaches I may try are more than welcome.

Thank you.

Andrea.

Laurent LAPORTE
  • 21,958
  • 6
  • 58
  • 103
Infinity77
  • 1,317
  • 10
  • 17

1 Answers1

2

You can save time without change the algorithm logic by caching result of shared sub-problems here.

For example, calling all_simple_paths(adjlist, 'A', 'D', []) in following graph will compute all_simple_paths(adjlist, 'D', 'E', []) multiple times: enter image description here

Python has a built-in decorator lru_cache for this task. It uses hash to memorize the parameters so you will need to change adjList and path to tuple since list is not hashable.

import timeit
import functools

@functools.lru_cache()
def all_simple_paths(adjlist, start, end, path):

    path = path + (start,)

    if start == end:
        return [path]

    paths = []

    for child in adjlist[start]:

        if child not in path:

            child_paths = all_simple_paths(tuple(adjlist), child, end, path)
            paths.extend(child_paths)

    return paths


fid = open('digraph.txt', 'rt')
adjlist = eval(fid.read().strip())

# you can also change your data format in txt
adjlist = tuple(tuple(pair)for pair in adjlist)

number = 1000
stmnt  = 'all_simple_paths(adjlist, 166, 180, ())'
setup  = 'from __main__ import all_simple_paths, adjlist'
elapsed = timeit.timeit(stmnt, setup=setup, number=number)/number
print('Elapsed: %0.2f ms'%(1000*elapsed))

Running time on my machine:
- original: 0.86ms
- with cache: 0.01ms

And this method should only work when there's a lot shared sub-problems.

CtheSky
  • 2,484
  • 14
  • 16
  • Thank you CtheSky, I'll make some tests soon against my program (and also compare it with my Cython version of it) and I'll report back. It seems impressive from you timings! – Infinity77 Sep 30 '17 at 10:01
  • I'm not familiar with Cython and it seems that `lru_cache` may not work as explained in this [thread](https://stackoverflow.com/questions/39540599/how-to-apply-decorators-to-cython-cpdef-functions). But I think you can build one from scratch with a hashtable. – CtheSky Sep 30 '17 at 12:57
  • For some reason the approach using the lru cache is enormously much slower in my program... like 10 times slower than my Cython translation of my original Python code (!!!) – Infinity77 Oct 01 '17 at 11:22
  • It's because you are using Cython. The thought behind it is to reuse the result if it has been already computed once instead of recomputing it multiple times which may cost a lot. You can apply the cache in Cython too. – CtheSky Oct 01 '17 at 11:37
  • You don't have to implement `lru_cache`, it's only one kind of cache. I mentioned it because it's built-in and I have given a link about why this decorator may not work in your case. – CtheSky Oct 01 '17 at 11:56
  • Sorry, what I meant is: your code exactly as it is versus a Cython implementation that does not use caching at all. The code I posted originally is the Python version of my Cython code, and I was hoping there would be some strategy for which pure-Python (or another approach) could beat my Cython code. The Cython code is simply a line-by-line translation of my original Python code, and it does not use lru cache at all... – Infinity77 Oct 01 '17 at 13:59
  • I thought you are asking for `"Any suggestion for improvements of the algorithm"` so I gave one. Maybe not a good one but I tried and tested and showed the improvement which is reproducible. Yes, we can implement an algorithm in different languages and ways, the performance also varies but I think that's another kind of "improvement". So I don't understand the meaning of your comparison since if you really care about performence, why not implement the cache in cython? Or you can use c/c++ and there are a lot of available cache implementation out there. – CtheSky Oct 01 '17 at 15:59
  • Thank you CtheSky, I have accepted your answer as the correct solution of my toy problem. It doesn't scale to my real problem but that is just too bad for me, nothing to do with your excellent suggestion. – Infinity77 Oct 01 '17 at 17:32