5

is there any way to get all the possible outcomes of a regular expression pattern?. everything I've seen refers to a pattern that is evaluated against a string. but what I need is to have a pattern like this:

^EM1650S(B{1,2}|L{1,2})?$

generate all possible matches:

EM1650S
EM1650SB
EM1650SBB
EM1650SL
EM1650SLL
John Saunders
  • 160,644
  • 26
  • 247
  • 397
etasoft
  • 51
  • 2
  • You don't see this very often because the vast majority of regex patterns tend to have an infinite number of possible outcomes. – Amber Jan 27 '12 at 19:26
  • 2
    Not infinite. Just humongous. – Alfabravo Jan 27 '12 at 19:27
  • 2
    @Alfabravo - Many (most?) RegExs I've seen have a `*` in them somewhere and if not "0 or more", usually a "1 or more" token. Those tokens typically give you an infinite list, since "or more" == "infinite". – cdeszaq Jan 27 '12 at 19:29
  • Sounds very hard, most regexps can match an infinite number of strings – Ruan Mendes Jan 27 '12 at 19:30
  • Here's another thread on the topic: http://stackoverflow.com/questions/7614962/create-set-of-all-possible-matches-for-a-given-regex – Dan A. Jan 27 '12 at 19:35
  • By the way, if the Regex is not surrounded by `^$` you don't even need `*` or `+` to make it have an infinite number of matches – Ruan Mendes Jan 27 '12 at 19:38
  • @JuanMendes - The `^` and `$` tokens just indicate, essentially, another specific character. _not_ having those doesn't indicate an infinite set of matches at all. You _still_ need a "X or more"-type token to get an infinite match. – cdeszaq Jan 27 '12 at 19:46
  • @cdeszaq; That's not correct. for the pattern `/ab/`, the following strings match it. Xab, XXab, XXXab, XXXab... See what I mean? http://jsfiddle.net/mendesjuan/zaae4/ – Ruan Mendes Jan 27 '12 at 19:50
  • @JuanMendes - Depending on what you mean by match, yes you are correct. I interpreted "match" to mean "give me the text that matches", not "does this text match". With my interpretation, the result would be "ab" in all cases, therefor, not infinite. – cdeszaq Jan 27 '12 at 19:55
  • well, the token ^ always mark the begin of pattern and $ idem the end, so there be a finite numbers of combinations as result – etasoft Jan 27 '12 at 20:32
  • Please consider migrating to cstheory – Greg Mattes Jan 27 '12 at 20:38
  • Possible duplicate of [Generate a random string based on a regular expression](https://stackoverflow.com/questions/8959850/generate-a-random-string-based-on-a-regular-expression) – Anderson Green Feb 10 '18 at 20:33

4 Answers4

2

In the general case, no. In this case, you have almost no solution space.

There's a section covering this in Higher Order Perl (PDF) and a Perl module. I never re-implemented it in anything else, but I had a similar problem and this solution was adequate for similarly-limited needs.

Dave Newton
  • 158,873
  • 26
  • 254
  • 302
2

There are tools that can display all possible matches of a regex.

Here is one written in Haskell: https://github.com/audreyt/regex-genex and here is a Perl module: http://metacpan.org/pod/Regexp::Genex

Unfortunately I couldn't find anything for JavaScript

szabgab
  • 6,202
  • 11
  • 50
  • 64
Kai Sternad
  • 22,214
  • 7
  • 47
  • 42
  • 1
    Interesting. Regexp-Genex limits `*` to `{0,3}` and `+` to `{1,4}`, ostensibly to make it more pretty, but that has the side effect of removing the infinite solution space that most regexes have. – Lily Ballard Jan 27 '12 at 19:33
  • 1
    Cool, the genex assume that `*` is `{0,3}` and `+` is `{1,4}` and therefore produces a finite number of permutations. Wow this is my second comment that I'm late to and ends up being a copy of someone else's comment. – Ruan Mendes Jan 27 '12 at 19:35
  • Yes, it's not perfect. Here's a discussion thread on Reddit about this topic: http://www.reddit.com/r/programming/comments/hya57/from_a_regex_generate_all_possible_strings_it_can/ – Kai Sternad Jan 27 '12 at 19:41
  • yes, thanks, i will review, in fact i need a tool in javascript, asp, asp.net (c# or vb) or php – etasoft Jan 27 '12 at 20:53
1

In this particular case, yes. The regex generates a finite number of valid string, so they can be counted up.

You'll just have to parse the regex. Some part of that (EM1650S) is mandatory, so think for the rest. Parse by the | (or) symbol. Then enumerate the strings for both sides of it. Then you can get all possible combinations of them.

Some regex (containing * or + symbols) can represent an infinite number of strings, so they cannot be counted.

Sufian Latif
  • 13,086
  • 3
  • 33
  • 70
1

From a computational theoretic standpoint, regular expressions are equivalent to finite state machines. This is part of "automata theory." You could create a finite state machine that is equivalent to a regular expression and then use graph traversal algorithms to traverse all paths of the FSM. In the general case a countably infinite number of strings may match a regular expression, so your program may never terminate depending on the input regular expression.

Greg Mattes
  • 33,090
  • 15
  • 73
  • 105