3

Sometimes you may come up with two different regex's for one task. I wonder how you check if two regex's describe the same pattern?

  • Are there some algorithms for that checking?

  • Are there some (online) tools for that checking?

For example, I have two regex's here Can we rewrite lookbehind in terms of the if-then-else?, I would like to know if they are the same.

Thanks.

Community
  • 1
  • 1
Tim
  • 1
  • 141
  • 372
  • 590
  • I would say whoever downvote don't know about regex, and don't want others to learn about it. – Tim Jun 10 '14 at 02:21
  • With Python you can : have a look at this http://stackoverflow.com/questions/21398251/checking-if-two-python-regex-patterns-are-equivalent. – ForguesR Jun 10 '14 at 02:22
  • @ForguesR: I may sometimes need regex in PCRE(PHP) flavor (having more features that the Python flavor), but I don't know about PHP well. So an online tool will be the best. – Tim Jun 10 '14 at 02:25
  • @ForguesR: Not quite. Testing `a+a` and `aa+` give different bytecodes, but they define the same language (i.e. match the same set of strings). – Amadan Jun 10 '14 at 02:25
  • 2
    OP: This is a very hard problem. For compsci (theoretic) regular expressions, http://stackoverflow.com/questions/560263/regular-expressions-equivalence and http://math.stackexchange.com/questions/46975/how-to-prove-two-regular-expressions-are-identical-in-mathematical-way give some resources. I don't know if anything was (or even can be) done about actual "regular expressions" like PCRE, which are not actually regular expressions in the sense they are defined mathematically, owing to their many expansions; I suspect it could be an NP-complete problem, given PCRE power. – Amadan Jun 10 '14 at 02:29
  • @Amadan: What kind of languages are those described by PCRE? Context-free, context-sensitive, or ...? So I may ask this question: how to determine if the xxx languages described by two PCRE regex's are the same? – Tim Jun 10 '14 at 02:31
  • http://cs.stackexchange.com/questions/4839/which-languages-do-perl-compatible-regular-expressions-recognize . And, as to your last question, I think I already answered: I don't think you can. I might be mistaken though. – Amadan Jun 10 '14 at 02:39

1 Answers1

2

Equivalence of regular languages is decidable (see Hopcroft, Motwani, Ullman: Introduction to Automata Theory, Languages and Computation, Chp 4.4) which is also the foundation for minimising the DFA. Intuitively, if the minimised DFAs are equivalent (upto renaming of states), than the languages generated/accepted by the regular languages are the same. So, the answer to your first question is yes.

I'm sure there are online tools, but in the worst case, you can ask 'flex' or equivalent to minimise the automata and you can implement a simple tool, which checks if they can be consistently renamed.

This SO entry is also relevant:

Regular expressions Equivalence

Community
  • 1
  • 1
user1666959
  • 1,805
  • 12
  • 11
  • PCRE that OP is asking about is, AFAIK, not representable by DFA, at least not fully. For example, recursive patterns. – Amadan Jun 10 '14 at 06:11
  • @amadan: the OP was asking about regular languages (first question) and see the carefully qualified end of the first paragraph of the answer :). It is unclear to me how much the OP (1) needs this, (2) how much effort is he willing to make and (3) how much he understands the issues so before dwelling on the weak second-order theory of one or two successors or something equally cryptic I pointed him to the relevant and very well written literature. – user1666959 Jun 10 '14 at 14:06
  • True. You might notice I pointed him the same way some three hours earlier in question comments, so I'm not really disagreeing with you. I am just making sure he understands there is a huge difference between true regular expressions, and what programmers call regular expressions, especially since he explicitly mentioned PCRE flavour. – Amadan Jun 10 '14 at 16:02