1

I'm working on a network inspection tool which allows users to match regex over network flows. Its working fine and now I would like to add graphing functionality to it. This is to provide the user a dot graph that represents the internal state machine of the regex.

I found interactive regex debugger and visualizer quite useful and want to have mostly similar Python-based functionality to generate the graph. I've a tool that accepts a custom regex language and converts the expression to such a graph. Since its not compatible with Python re expressions, I can not use it.

Any pointers to how I can convert Python regex to FA state machines?

7h3rAm
  • 1,775
  • 2
  • 15
  • 17
  • Write a regex engine. – millimoose Sep 23 '13 at 18:45
  • "Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow…" See the [help](http://stackoverflow.com/help) for more details. – abarnert Sep 23 '13 at 18:46
  • Anyway this is probably impossible. Python's regular expression engine supports features that require backtracking - I don't believe finite automata can express those constructs. (Even if, the resulting graph would probably be completely insane.) – millimoose Sep 23 '13 at 18:46
  • 3
    Meanwhile, if you look at [Converting regular expressions to finite state machine](http://stackoverflow.com/questions/11426486/converting-regular-expression-to-finite-state-machine) and its Related questions, you'll find a lot of useful information. (It may not actually be a dup, because you're not asking how to do it, but asking for a ready-made tool that does it… but it's certainly related.) – abarnert Sep 23 '13 at 18:47
  • @abarnert I've also asked for "pointers"/"approach" on how to do this. If there is such a tool I would be interested in figuring out how it works. But I'm certainly willing to do it on my own. Thanks for the link in your second comment. – 7h3rAm Sep 24 '13 at 05:32
  • @abarnert I've edited the question to comply with SO questions posting rules. It now is certainly a duplicate of [Converting regular expressions to finite state machine](http://stackoverflow.com/questions/11426486/converting-regular-expression-to-finite-state-machine) and can be marked as such. – 7h3rAm Sep 24 '13 at 05:52
  • @7h3rAm: As far as I know, the only way to do that is to get 5 people to vote to reopen it and then vote to close it again as a dup, which probably isn't going to happen. (If you had enough rep, I think you could do it yourself, but you don't.) So… your comment and the "Linked" link will probably have to serve. – abarnert Sep 24 '13 at 23:33
  • @abarnert Thanks for the clarification. I would like to upvote your comment someday I've enough rep to do so. It will highlight it for those coming here from now on. I've accepted NPE's answer which seems right to the point. – 7h3rAm Sep 25 '13 at 03:51

1 Answers1

3

If you allow the full spectrum of Python regular expressions, this can't be done. The reason is that some expressions supported by re are not regular at all and cannot be represented by a finite automaton. One simple example is r"(a+)b\1", which matches some number of a's, a b, and then the same number of a's as before (see Regular expression implementation details). There is no DFA that admits this expression.

Community
  • 1
  • 1
NPE
  • 486,780
  • 108
  • 951
  • 1,012
  • This seems more like a comment than an answer. – abarnert Sep 23 '13 at 18:48
  • 3
    @abarnert: I think "can't be done" (with an explanation) is a perfectly valid type of answer. :) – NPE Sep 23 '13 at 18:49
  • @NPE I was certain about backtracking issue with FA. Thanks anyways for pointing it out. The linked SO question is of immense help. – 7h3rAm Sep 24 '13 at 05:41