1

I would like to compare two regular expressions in Python.

Basically, I need to test if one expression is included in the other one.

For example, is [AB]D included in [AB][CD]. or is ...K.. included in ...[KR]..

I tried something like the following but it doesn't work:

re.finditer(r"[AB][DF]",r"[AB]D")
re.finditer(r"[AB]D",r"[AB][CD]")

My expressions can have different size but a solution with same size expression would be great.

EDIT

All my regular expressions are prety simple.

They only contains "points", "square braquets" and "^".

. means "anything" (like * in real regexps)
[AB] means "A or B"
[^P] means "not P"

EDIT 2

Thanks for your answers and comments, I think I will generate the set of all string from one regexp and test them with the second regular expression.

Cyrlop
  • 1,894
  • 1
  • 17
  • 31
  • You can find if one string is in another one with `if first_string in second_string`. Or do you mean something else when you say "is included in"? – Kevin Aug 27 '14 at 15:42
  • Do you need to solve the *general* problem (any regular expressions) or simply expressions which look like yours, i.e. of the `[AB]D` vs. `[AB][CD]` variety? The latter problem would be a *lot* simpler. – DSM Aug 27 '14 at 15:48
  • The proposed duplicate question is not stellar but the answers are. – tripleee Aug 27 '14 at 16:08
  • @Kevin I mean something else: in python regexp language for example, I mean ...K.. is included in ...[KR].. or [KR] is included into [^P]. – Cyrlop Aug 27 '14 at 16:15
  • @DSM I think I want to solve general stuff but all my regular expressions are prety simple, containing "points", "square braquets" and "^". Points are the common "*" for saying anything, [AB] means A or B and [^P] means "not P". – Cyrlop Aug 27 '14 at 16:19

1 Answers1

5

You can do it, but you'll have to do it yourself. This is a lot of work and you may decide that it's not worth the effort. Here is a way you can do it:

  1. Turn the regular expressions A and B into NFAs.

  2. Let (a, b) be the set of initial states for your two regular expressions in NFA form.

  3. Take the epsilon closure of both sets, (e(a), e(b)).

  4. For each symbol, follow all transitions from e(a) and e(b) along symbol that symbol to form a new state, (a', b').

  5. Go back to step three.

Eventually you will recurse over all possible sets of states for both regular expressions. If at any point e(b) contains a final state but e(a) does not, then B is not contained in A.

This is guaranteed to terminate because there is a finite number of sets of states. This technique will not work with backreferences. Technically if you use backreferences, then they are not regular expressions any more, at least from the viewpoint of formal languages.

Dietrich Epp
  • 205,541
  • 37
  • 345
  • 415
  • +1 This also requires that the expressions are *true* regular expressions, not using any of the look-ahead or look-back assertions available in the extended regular expressions. – chepner Aug 27 '14 at 15:58
  • Thanks for this answer. It,s way to complicated for what I want to do. In my case, I think comparing all possible expression for one regular expression will be easier. As explained in my EDIT, my regexps are very simple. – Cyrlop Aug 27 '14 at 16:25