22

Given a regular expression, how can I list all possible matches? For example: AB[CD]1234, I want it to return a list like: ABC1234 ABD1234

I searched the web, but couldn't find anything.

gmuller
  • 451
  • 1
  • 6
  • 17
  • 1
    Because I have an application that need to combine license plates. And certain chars, have multiple possibilities. Like the example above. – gmuller Nov 03 '09 at 14:12
  • 1
    See http://stackoverflow.com/questions/1248519/how-can-i-expand-a-finite-pattern-into-all-its-possible-matches –  Nov 03 '09 at 14:15
  • I'm willing to bet that this problem is NP-hard. Not willing to spend the time on a proof though :P – Peter Di Cecco Nov 03 '09 at 14:26

9 Answers9

20

Exrex can do this:

$ python exrex.py 'AB[CD]1234'
ABC1234
ABD1234
Sjoerd
  • 74,049
  • 16
  • 131
  • 175
  • 1
    At last an actual answer! Tested this with `pip install --user exrex` and then `exrex 'AB[CD]1234'` -- works exactly as advertised. Thanks! – waldyrious Aug 28 '18 at 11:41
  • 1
    By the way, [regldg](http://regldg.com) also does this -- it's meant to be downloaded, but you can also try it directly in the browser, which is convenient if you need this for a one-off thing. – waldyrious Aug 28 '18 at 11:51
  • is anybody else encountering ``` "File "/Users/ionutmnt/.pyenv/versions/3.11.1/bin/exrex.py", line 25, in from re import sre_parse, U ImportError: cannot import name 'sre_parse' from 're' ``` ? How could I solve that? – M.Ionut Jan 29 '23 at 17:56
  • awesome! a cli as well as a library, nothing more that i can ask! – 0xInfection Jul 08 '23 at 14:53
7

The reason you haven't found anything is probably because this is a problem of serious complexity given the amount of combinations certain expressions would allow. Some regular expressions could even allow infite matches:

Consider following expressions:

AB[A-Z0-9]{1,10}1234

AB.*1234

I think your best bet would be to create an algorithm yourself based on a small subset of allowed patterns. In your specific case, I would suggest to use a more naive approach than a regular expression.

Yannick Motton
  • 34,761
  • 4
  • 39
  • 55
5

It's possible to write an algorithm to do this but it will only work for regular expressions that have a finite set of possible matches. Your regexes would be limited to using:

  • Optional: ?
  • Characters: . \d \D
  • Sets: like [1a-c]
  • Negated sets: [^2-9d-z]
  • Alternations: |
  • Positive lookarounds

So your regexes could NOT use:

  • Repeaters: * +
  • Word patterns: \w \W
  • Negative lookarounds
  • Some zero-width assertions: ^ $

And there are some others (word boundaries, lazy & greedy quantifiers) I'm not sure about yet.

As for the algorithm itself, another user posted a link to this answer which describes how to create it.

Community
  • 1
  • 1
Keith
  • 20,636
  • 11
  • 84
  • 125
  • 1
    Of many incorrect and negative answers to this question, this is a good one. The question was not about expanding any valid regex, just a certain regex, and this answer, and answer from @Sjoerd address this. – Jack Wasey Jul 02 '22 at 12:52
4

For some simple regular expressions like the one you provided (AB[CD]1234), there is a limited set of matches. But for other expressions (AB[CD]*1234) the number of possible matches are not limited.

One method for locating all the posibilities, is to detect where in the regular expression there are choices. For each possible choice generate a new regular expression based on the original regular expression and the current choice. This new regular expression is now a bit simpler than the original one.

For an expression like "A[BC][DE]F", the method will proceed as follows

getAllMatches("A[BC][DE]F")
= getAllMatches("AB[DE]F") + getAllMatches("AC[DE]F")
= getAllMatches("ABDF") + getAllMatches("ABEF") 
   + getAllMatches("ACDF")+ getAllMatches("ACEF")
= "ABDF" + "ABEF" + "ACDF" + "ACEF"
midtiby
  • 14,550
  • 6
  • 34
  • 43
3

Well you could convert the regular expression into an equivalent finite state machine (is relatively simple and can be done algorithmly) and then recursively folow every possible path through that fsm, outputting the followed paths through the machine. It's neither very hard nor computer intensive per output (you will normally get a HUGE amount of output however). You should however take care to disallow potentielly infinite passes (like .*). This can be done by having a maximum allowed path length, after which the tracing is aborted

Grizzly
  • 19,595
  • 4
  • 60
  • 78
0

Impossible.

Really.

Consider look ahead assertions. And what about .*, how will you generate all possible strings that match that regex?

Bart Kiers
  • 166,582
  • 36
  • 299
  • 288
  • 3
    Impossible for any given regex, but definitely possible if you remove some regex features, such as a variable amount of repetition (e.g., `.*`, `.+`, etc)... still, even with those constraints, a difficult but interesting problem. – speedplane Jun 03 '16 at 01:19
  • Infinite possibilities can be detected and displayed to the user as an error. – Skere Oct 17 '18 at 14:23
0

A regular expression is intended to do nothing more than match to a pattern, that being said, the regular expression will never 'list' anything, only match. If you want to get a list of all matches I believe you will need to do it on your own.

Scott Lance
  • 2,239
  • 1
  • 18
  • 19
-1

It may be possible to find some code to list all possible matches for something as simple as you are doing. But most regular expressions you would not even want to attempt listing all possible matches.

For example AB.*1234 would be AB followed by absolutely anything and then 1234.

Kenny Drobnack
  • 171
  • 2
  • 12
-1

I'm not entirely sure this is even possible, but if it were, it would be so cpu/time intensive for many situations that it would not be useful.

For instance, try to make a list of all matches for A.*Z

There are sites that help with building a good regular expression though:

chills42
  • 14,201
  • 3
  • 42
  • 77