From the use of the regex
tag, I deduce that the expression which is to be optimised is a regular expression, not a boolean expression.
Regular expressions can be easily reduced to DFAs (state machines) although the number of states is potentially exponential in the size of the regular expression. Once you have the DFA, you can use one of the well-known algorithms for minimizing DFAs; it's not too difficult to do this in O(n log n)
where n
is the number of states.
A good regex library should be able to do all this for you, but many don't. You might want to take a look at Ragel.
If the DFA for your regular expression is in fact much larger than the regular expression, which is rare but not completely unknown, you might find that the above procedure does blow up memory. For this reason, many regular expression libraries do not perform the full DFA reduction; rather, they reduce only to an NFA and then perform the powerset construction lazily, caching the results. This should avoid memory issues at the cost of increased scan time.
An example of a regular expression which shows exponential blow-up:
A(A|B)(A|B)(A|B)(A|B)(A|B)(A|B)
The corresponding DFA must have at least 32 states (that is, 25 where 5
is the number of repetitions of (A|B)
.