Here is one approach that is conceptually simple, if not terribly efficient (not that there is necessarily a more efficient solution...):
- Construct NFAs M and N for regular expressions r and s, respectively. You can do this using the construction introduced in the proof that finite automata describe the same languages.
- Determinize M and N to get M' and N'. We might as well go ahead and minimize them at this point... giving M'' and N''.
- Construct a machine C using the Cartesian product machine construction on machines M'' and N''. Acceptance will be determined by the symmetric difference, or XOR, criterion: accepting states in the product machine correspond to pairs of states (m, n) where exactly one of the two states is accepting in its automaton.
- Minimize C and call the result C'
- If L(r) = L(s)', then the initial state of C' will be accepting and C' will have all transitions originating in the initial state also terminating in the initial state. If this is the case,
Why should this work? The symmetric difference of two sets is the set of everything in exactly one (not both, not neither). If L(s) and L(r) are complementary, then it is not difficult to see that the symmetric difference includes all strings (by definition, the complement of a set contains everything not in the set). Suppose now there were non-complementary sets whose symmetric difference were the universe of all strings. The sets are not complementary, so either (1) their union is non-empty or (2) their union is not the universe of all strings. In case (1), the symmetric difference will not include the shared element; in case (2), the symmetric difference will not include the missing strings. So, only complementary sets have the symmetric difference equal to the universe of all strings; and a minimal DFA for the set of all strings will always have an accepting initial state with self-loops.