Check if a given regex will match anything

Question

Is it possible to check if a given regular expression will match any string? Specifically, I'm looking for a function matchesEverything($regex) that returns true iff $regex will match any string.

I suppose that this is equivalent to asking, "given a regex r, does there exist a string that doesn't match r?" and I don't think this is solvable without placing bounds on the set of "all strings". I.e., if I assume the strings will never contain "blahblah", then I can simply check if r matches "blahblah". But what if there are no such bounds? I'm wondering if this problem can be solved checking if the regex r is equivalent to .*.

I believe this is equivalent to the [Halting Problem](http://en.wikipedia.org/wiki/Halting_problem). It may not be possible to write an algorithm to determine if an arbitrary regex is equivalent to `.*` — Jim Garrison, Jul 30 '13 at 18:30
Regexes with lookarounds and backrefs, but no code interpolation, should be a subset of or equal to context sensitive grammars. Deciding these languages is not Turing complete, therefore this question shouldn't be equivalent to the halting problem. *If*, given a CSG, we can produce a string of this language by substituting the rules, then we may apply a forbidden substitution, thus producing a string that is not in our language. Sadly I don't know whether this is the case, and I wouldn't be able to sketch a proof of this. — amon, Jul 30 '13 at 19:47
This is called the "Emptiness Problem", and is decidable for DFAs/NFAs (i.e. regexes without backreferences/lookarounds) http://www.cs.miami.edu/~ogihara/csc527/new04-1.pdf For regexes with backrefs (context sensitive grammars), the emptiness problem is undecidable. (I can't find a proof right now, but it's frequently mentioned in the literature) — rmmh, Jul 30 '13 at 20:05

score 13 · Accepted Answer · edited May 23 '17 at 10:31

This doesn't exactly answer your question, but hopefully explains a little why a simple answer is hard to come by:

First, the term 'regex' is a bit murky, so just to clarify, we have:

"Strict" regular expressions, which are equivalent to deterministic finite automatons (DFAs).
Perl-compatible regular expressions (PCREs), which add a bunch of bells and whistles such as lookaheads, backreferences, etc. These are implemented in other languages too, such as Python and Java.
Actual Perl regular expressions, which can get even more crazy, including arbitrary Perl code, via the ?{...} construct.

I think this problem is solvable for strict regular expressions. You just construct the corresponding DFA and search that graph to see if there's any path to a non-accept state. But that doesn't help for 'real world' regex, which is usually PCRE.

I don't think PCRE is Turing-complete (though I don't know - see this question, too: Are Perl regexes turing complete?). If it were, then I think as Jim Garrison commented, this is basically the halting problem. That said, it's not easy to transform them into a DFA, either, making the above method useless...

I don't have an answer for PCREs, but be aware that the aforementioned constructs (backreferences, etc) would make it pretty hard, I imagine. Though I hesitate to say "impossible."

A genuine Perl regex with ?{...} in it is definitely Turing-complete, so there be dragons, and I think you're out of luck.

great reply. you've reinforced what i was thinking. for the use case that i'm addressing, any actual perl regular expressions would be what i care about. pretty much anything where `eval{ 'xx' =~ m/$regex/i; }` results in a successful eval. — Christopher Neylan, Jul 30 '13 at 19:48

Check if a given regex will match anything

1 Answers1

Linked