It is not possible to write a function f()
that takes an arbitrary user-supplied JS regular expression and accurately decides whether or not the regular expression could ever match a string consisting of more than one character. Any function you write will either sometimes return an incorrect result, or you will need to allow for the function to return an "I don't know" result.
There are what amounts to formal proofs of this, but I won't try to present them here. Instead I'll just point to On Lookaheads in Regular Expressions with
Backreferences by Nariyoshi Chida and Tachio Terauchi, which shows that the emptiness problem for the sort of regular expressions that exist JavaScript (which include backreferences and lookahead and lookbehind assertions) is undecidable. That means it is not possible to write a function that will always correctly decide whether an input JS regular expression has any matches at all.
And if there were a magical function f()
to answer the question for length two or more, then you could use it to build an answer to the emptiness problem, by testing the empty string and every string of length one (this is tedious but theoretically possible), and combine the results of that with the magical function to get the full solution to the emptiness problem. Since the emptiness problem is undecidable, so is the problem you've described.
So no, it cannot be done for arbitrary JavaScript regular expressions.
Assuming that's too abstract, imagine the user supplies a specific (horrible) regular expression r
, and let's investigate if we can write a function f()
that can reliably throw an error if and only if r.test(s) === false
for all s
where s.length > 1
. Here's the monster:
const r = /^x(?!x*(?<!^x(?:x?|\1+(xx+)))(?!(?:x?|(xx+?)\2+)$))($|xx(xx)+)$/
I claim that r
will match a string s
if and only if s
satisfies all of these criteria:
it consists only of the letter "x"
. That is, /^x*$/.test(s) === true
, and
its length is an odd number not equal to three. That is, s.length % 2 == 1 && s.length !== 3
, and
its length cannot be written as p+q+1
where p
and q
are prime numbers. That is, assuming you have a function primes(n)
that returns an array of all prime numbers less than n
, then primes(s.length).every(p => primes(s.length-p).every(q => s.length !== p+q+1))
I built r
using the regular expression mentioned in How to determine if a number is a prime with regex? along with lookaheads and lookbehinds. Roughly, it says that there is no point in the string where the number of characters before it is one-plus-a-prime (using look-behind) and where the number of characters after it is a prime (using look-ahead).
I don't know if that convinces you that my claim about what r
does is correct but you can test it if you want. Let's assume for the moment that it is. That means it accepts the input "x"
, since its length is 1, and 1 is not the sum of two prime numbers:
console.log(r.test("x")); // true
So far this doesn't invalidate r
because it's okay if it accepts a one-character string like "x"
.
But: is there a string of two or more "x"
characters that it would accept? Should f(r)
throw an error? Well, that would require we find an odd number larger than three which is not the sum of two primes. Which means we need to find an even number larger than two which is not the sum of two primes.
In other words: f(r)
should not throw an error if and only if every even number greater than two is equal to the sum of two prime numbers. But that's the same as Goldbach's conjecture, a famous unsolved math problem. Mathematicians have been trying for hundreds of years to determine if that is true or false, and we haven't figured it out yet, as of 2023. We think that it is true, and we know that if there is a counterexample it's very large, but it has not been proven.
That means the function f()
would need to be able to prove or disprove Goldbach's conjecture in order to work properly. That by itself doesn't mean it's impossible, but it does mean that nobody currently knows how to do it.
Even if my claim about r
's behavior is incorrect, or if you want to get technical and say that Goldbach's conjecture has been confirmed for all numbers which could possibly be JS string lengths, this should still give you serious pause, since it hopefully demonstrates that one can come up with JS regular expressions where it is not at all clear which strings it might accept.
So, there you go. For arbitrary JS regular expression inputs, it's impossible, and even if it were possible, it would be very difficult.
If you want to restrict the possible inputs to just a subset of the features of JS regular expressions, say by prohibiting backreferences and lookarounds, then the answer would probably change. The emptiness problem for regular languages is decidable, and you could probably use that result to write an algorithm that works for strings of length two or more. But that would be a different question and is out of scope for the question as asked.
Finally, let's step way back and look at what you're trying to do. It is almost certainly more trouble than it's worth to allow a user to supply arbitrary JS regular expressions, if you need to do any sort of validation of them at all.
Instead, you should consider accepting some simpler data structure that cannot be misused (either intentionally or unintentionally). Depending on your use case, you might switch to just a string that holds all the characters you want to accept, or a set of enums corresponding to common character ranges, etc.
Regular expressions are notoriously tricky to work with, as evidenced by the famous aphorism:
Some people, when confronted with a problem, think
“I know, I'll use regular expressions.” Now they have two problems.
If you switch away from regular expressions, you'll cut your number of problems in half.
Playground link to code