How to determine if one regex is subset of another?

Question

Give the two regex, A = 0*1* U 1*0* and B = (01 U 10)*, how do I determine if one is subset of the other. I guess one approach is to list some examples out and see if they have anything in common. In this case, I see strings 01, 10 are shared in both set. So it's the case that they are not subset of each other?? How do I know that one regex is subset of the other? In general, how do you go about approaching problems like this?

https://stackoverflow.com/questions/897595/proofs-about-regular-expressions — JitterbugChew, Aug 29 '18 at 19:41
Is this related to programming? If it is then please add appropriate tags. — revo, Aug 29 '18 at 19:47
@CodeDifferent I believe you meant generating the minimal *Deterministic Finite Automaton* which is unique. The AST is the result of parsing, it does not encapsulate the language that generated it. — Olivier Melançon, Aug 29 '18 at 19:53
This seems to be more about mathematical regular expressions, not programming. [math.se] or [cs.se] would be better places to ask. But I suspect you'll learn that this is at least NP-Complete, and possibly even equivalent to the Halting Problem. — Barmar, Aug 29 '18 at 20:02

score 3 · Accepted Answer · answered Sep 18 '18 at 20:04

There are obviously a lot of ways to do this - any logical argument could constitute a valid proof. However, an instructive method of answering this question is to use algorithms to compute an answer to the general question.

Two languages are equal if each contains the other. If one language contains another, the difference of the contained language and the containing language is the empty set. Therefore, if two languages A and B are equal, then A \ B and B \ A are both empty; and if A \ B and B \ A are both empty, then A and B must be equal.

Given a regular expression, there is at least one known correct algorithm to convert it into an equivalent NFA with lambda/epsilon transitions. Such a construction is used in the canonical proof of the equivalence of regular expressions and finite automata.

Given an NFA with lambda/epsilon transitions, there is at least one known correct algorithm to convert it into an equivalent DFA. The subset construction is such an algorithm.

Given two DFAs there is at least one known correct algorithm to produce a DFA which accepts the difference of the languages accepted by those two DFAs. The Cartesian Product Machine construction is such an algorithm.

Given a DFA, there is an algorithm to determine whether it accepts the empty language. DFA minimization followed by a check for any accepting states is a such an algorithm.

Therefore, to algorithmically determine whether two regular expressions r1 and r2 are equal:

generate a NFA-lambda N1 for r1
generate a NFA-lambda N2 for r2
generate a DFA D1 for N1
generate a DFA D2 for N2
generate a DFA D12 for L(D1) \ L(D2)
generate a DFA D21 for L(D2) \ L(D1)
generate a DFA M12 by minimizing D12
generate a DFA M21 by minimizing D21
L(r1) = L(r2) if and only if M12 and M21 both accept the empty language

When it doubt, work it out

How to determine if one regex is subset of another?

1 Answers1