2

Given a directed possibly cyclic networkx graph. I would like to search for paths which are described as a regular expression. The regexes will not describe cycles. An example would be:

nodeA, [^nodeB]* [nodeC|nodeD]

In english this should correspond to: I am looking for a path starting with nodeA,followed by zero, one or many nodes which are not nodeB, and finally the path should end with either nodeC or nodeD.

Example paths I would like to find would be:

  • nodeA->nodeX->nodeC
  • nodeA->nodeX->nodeY->nodeD
  • nodeA->nodeC
  • ...

I found this related post on SO. Yet, it is about creating a regex from all possible paths between two nodes. In my case, I provide a regex and would like to get the corresponding paths.

My incomplete (and probably insufficient) approach is this:

  1. Split the regex-described path up into parts starting with a single fixed node and ending with a single fixed node. Single fixed node means some_node (i.e. no special regex characters such as [],^, *, + etc.).
  2. Search for this path using regular networkx functionality
  3. If a path is found, I go through it node by node and look at the "current" regex part (starting at the beginning of the regex).
  4. If the current regex part is a [^some_node] then I check if the current node is equal to that. I break if it is equal and advance the node and the regex part if not.
  5. If the current regex part is a some_node , i.e. the node some_node must be there then I check if the current node is equal to that. If it is, I advance the node and the regex. If not, I only advance the node.

Before writing more code for the other cases, e.g. [some_node]* or [some_node]+ I wanted to check if there exists some code or algorithm for this task already.

Is there any read-to-use module/lib/sample code or the like for this task ?
If not, do you know of an existing algorithm "regex to path search in graph"?
If not, could you outline an algorithm ?

Community
  • 1
  • 1
langlauf.io
  • 3,009
  • 2
  • 28
  • 45
  • 2
    You should checkout [Neo4j](http://neo4j.com/). It is a graph NoSQL database that uses [CQL](http://neo4j.com/docs/stable/cypher-query-lang.html) (Cypher Query Language). It pretty much is a form of specifying regex for paths. Maybe it will give you some insights. – cenouro Nov 06 '15 at 16:11
  • 1
    @cenouro : Excellent hint, thanks. After having read a bit about graph databases, I am wondering if CQL or gremlin is better suited for my problem. What do you think? – langlauf.io Nov 06 '15 at 17:21
  • 1
    I have never heard about Gremlim, so I can't give my opinion. Also, I haven't used Neo4j, only read the docs and done some tutorials. They seem to be a solid choice for you, but YMMV. Another lib you might be interested is [igraph](http://igraph.org/redirect.html). It is widely adopted by Python devs and depending on the size of your graphs, Neo4j/Gremlim may be overkill. – cenouro Nov 06 '15 at 17:39
  • @cenouro Thanks for the pointer to igraph. – langlauf.io Nov 06 '15 at 17:42

0 Answers0