10

I am very new to python coding and I am looking for an algorithm that would quickly find all of the paths between a start node and end node for a very large graph - say a graph that has approx 1000 nodes and 10,000 edges. The number of paths that actually exist from the start node to the end node is small - ten or fewer. To help contextualize the question a bit more, consider a social network - If I had 1000 friends and I wanted to know how many ways my best friend from high school connect to my roommate from college, I don't care about the fact that my best friend from high school is connected to all 200 of my high school friends because those paths never lead to my roommate. What I want to do with this python code is quickly subset the paths that exist between my two friends and essentially get rid of all of the "noise" that exists around these two nodes.

I have tried to implement a number of examples of code, all of which work well on small, simple graphs. However, when I try to incorporate them into my large graph analysis, they all take too long to be useful.

Do you all have any suggestions of methods to investigate (i.e., something already created in networkx or even info on using a stack vs. recursion, etc... ), code examples to implement, or even other routes outside of python to pursue? Keep in mind, I am a python newbie.

Mike
  • 23,542
  • 14
  • 76
  • 87
Garen Pledge
  • 113
  • 1
  • 6
  • Translate one of the solutions in this post to python: http://stackoverflow.com/questions/58306/graph-algorithm-to-find-all-connections-between-two-arbitrary-vertices – Adrián Feb 06 '13 at 23:38
  • Also, this: http://stackoverflow.com/questions/8922060/breadth-first-search-trace-path – Adrián Feb 06 '13 at 23:39
  • 1
    The challenge is knowing when you've found them all. I don't think that's possible without examining a huge number of nodes. The algorithm can't neglect your 200 friends, since it can't know (without examining them, and their further friends) that they don't connect to your roommate. Indeed, isn't determining if there are paths through those friends the whole point of running the search? – Blckknght Feb 06 '13 at 23:57
  • Adrian - thanks for the resources! I will definitely check them out. Blckknght - good points - I am running into the same challenges that you are seeing. One thing I am thinking in approaching this is that I want the process to look at all the nodes, but to only output the paths that go from one node to the end. What I think so far is something that will find nodes that are "1 hop" away and then "2 hops" etc... but in order for the algorithm to go quickly, I need to use a hash table/look up array instead of a queue (which I have used so far) - does anyone see problems in this so far? – Garen Pledge Feb 08 '13 at 02:48

2 Answers2

4

Maybe you want all of the simple (no repeated nodes) paths between two nodes? NetworkX has a function for that which is based on depth-first search. See http://networkx.github.com/documentation/development/reference/generated/networkx.algorithms.simple_paths.all_simple_paths.html

The example from there shows that the number of simple paths can be large:

>>> import networkx as nx
>>> G = nx.complete_graph(4)
>>> for path in nx.all_simple_paths(G, source=0, target=3):
...     print(path)
...
[0, 1, 2, 3]
[0, 1, 3]
[0, 2, 1, 3]
[0, 2, 3]
[0, 3]
Aric
  • 24,511
  • 5
  • 78
  • 77
  • 1
    Aric, I've noticed that the current implementation does not scale well on real networks. Is there a potential workaround? – Moses Xu Mar 19 '14 at 03:48
1

Personally I would recommend using a graph database for this. Neo4j or Rexter spring to mind.

When accessing these from Python there are a few libraries available:

Although it would not be impossible to write a fast/scalable Python version of these, there isn't one right now as far as I am aware.

Wolph
  • 78,177
  • 11
  • 137
  • 148