4

I want to enumerate all possible strings matched by a regular expression. All the regular expressions I want to match against have no * or + only something like x*{5} which is equivalent to x?x?x?x?x?.

so given any regular expression like the one below:

[a-c]?cdr*{0,2}

i want all strings matching the expression. Thus the library or program shall output something like this:

cd, acd, bcd, ccd, cdr, acdr, bcdr, ccdr, cdrr, acdrr, bcdrr, ccdrr

I don't care about the language it is implemented in as long as it runs in linux.

refinement: if the regular expression is transformed into a deterministic finite automaton the automaton must be representable as a directed acyclic graph. that's why the possible output strings must be enumerable (not infinitely long strings).

Alexander Oh
  • 24,223
  • 14
  • 73
  • 76
  • http://stackoverflow.com/questions/406230/regular-expression-to-match-string-not-containing-a-word – Mithun Sasidharan Nov 14 '11 at 11:24
  • 1
    @Mithun: The linked question is totally unrelated to this one? – Jens Nov 14 '11 at 11:25
  • 3
    This one is closer to what you want http://stackoverflow.com/questions/1248519/how-can-i-expand-a-finite-pattern-into-all-its-possible-matches – Narendra Yadala Nov 14 '11 at 11:29
  • Your question doesn't make sense. Your "regex" consists of a max of 3 characters, unless I am mistaken, yet your results have more than three characters? – FailedDev Nov 14 '11 at 11:32
  • @FailedDev the *{2} operator means 0 or 2 repetitions of the last character. – Alexander Oh Nov 14 '11 at 11:35
  • @Alex Yes. And you have maybe a single a/b/c in the start therefore 3 right? – FailedDev Nov 14 '11 at 11:38
  • @FailedDev uhm what about the cd in the middle? it shall read a prefix: a,b or c followed by a cd followed by at most 2 occurrences of r. making `c cd rr` a 3 character pattern shall match? – Alexander Oh Nov 14 '11 at 11:41
  • 1
    @Alex OK now I get what you meant.. So [abc]?cdr{0,2} is what you need I assume. – FailedDev Nov 14 '11 at 11:43
  • @FailedDev (a|b|c)? and [abc]? should be equivalent. how does this help me enumerate ALL matching strings? – Alexander Oh Nov 14 '11 at 11:45
  • @NarendraYadala thank you for the link, I was hopeing for some library or implementation of this. – Alexander Oh Nov 14 '11 at 11:52
  • Your regex syntax is wrong, as @FailedDev points out. `x*{2}` doesn't mean 0 to 2 repetitions of `x` (at least not in any regex flavor I know); you want `x{0,2}`. – Tim Pietzcker Nov 14 '11 at 12:37
  • @TimPietzcker Even I was getting an error in all languages and I was using RegexBuddy to test it. But POSIX BRE and GNU BRE does not give an error, but they do not work like OP says either. They match `{2}` literally. – Narendra Yadala Nov 14 '11 at 13:04

2 Answers2

3

I think this Java library will help you with this http://code.google.com/p/xeger/ and since it is Java, it would also run on Linux.

Narendra Yadala
  • 9,554
  • 1
  • 28
  • 43
2

Here is a python solution for this problem: https://github.com/asciimoo/exrex

it handles * and + operators as well

asciimoo
  • 631
  • 5
  • 9