Give the two regex, A = 0*1* U 1*0* and B = (01 U 10)*, how do I determine if one is subset of the other. I guess one approach is to list some examples out and see if they have anything in common. In this case, I see strings 01, 10 are shared in both set. So it's the case that they are not subset of each other?? How do I know that one regex is subset of the other? In general, how do you go about approaching problems like this?
-
I don't think this is a trivial problem! – karakfa Aug 29 '18 at 19:38
-
https://stackoverflow.com/questions/897595/proofs-about-regular-expressions – JitterbugChew Aug 29 '18 at 19:41
-
Is this related to programming? If it is then please add appropriate tags. – revo Aug 29 '18 at 19:47
-
@CodeDifferent I believe you meant generating the minimal *Deterministic Finite Automaton* which is unique. The AST is the result of parsing, it does not encapsulate the language that generated it. – Olivier Melançon Aug 29 '18 at 19:53
-
This seems to be more about mathematical regular expressions, not programming. [math.se] or [cs.se] would be better places to ask. But I suspect you'll learn that this is at least NP-Complete, and possibly even equivalent to the Halting Problem. – Barmar Aug 29 '18 at 20:02
1 Answers
There are obviously a lot of ways to do this - any logical argument could constitute a valid proof. However, an instructive method of answering this question is to use algorithms to compute an answer to the general question.
Two languages are equal if each contains the other. If one language contains another, the difference of the contained language and the containing language is the empty set. Therefore, if two languages A and B are equal, then A \ B and B \ A are both empty; and if A \ B and B \ A are both empty, then A and B must be equal.
Given a regular expression, there is at least one known correct algorithm to convert it into an equivalent NFA with lambda/epsilon transitions. Such a construction is used in the canonical proof of the equivalence of regular expressions and finite automata.
Given an NFA with lambda/epsilon transitions, there is at least one known correct algorithm to convert it into an equivalent DFA. The subset construction is such an algorithm.
Given two DFAs there is at least one known correct algorithm to produce a DFA which accepts the difference of the languages accepted by those two DFAs. The Cartesian Product Machine construction is such an algorithm.
Given a DFA, there is an algorithm to determine whether it accepts the empty language. DFA minimization followed by a check for any accepting states is a such an algorithm.
Therefore, to algorithmically determine whether two regular expressions r1 and r2 are equal:
- generate a NFA-lambda N1 for r1
- generate a NFA-lambda N2 for r2
- generate a DFA D1 for N1
- generate a DFA D2 for N2
- generate a DFA D12 for L(D1) \ L(D2)
- generate a DFA D21 for L(D2) \ L(D1)
- generate a DFA M12 by minimizing D12
- generate a DFA M21 by minimizing D21
- L(r1) = L(r2) if and only if M12 and M21 both accept the empty language
When it doubt, work it out

- 27,682
- 3
- 38
- 73