The following program takes 1.5 seconds to run on my Intel Core i7 machine:
public static void main(String[] args) {
String regex = ".*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*b.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*x";
long start = System.nanoTime();
boolean match = "aaabccc".matches(regex);
long end = System.nanoTime();
System.out.println((end - start) / 1e9);
}
The regex consists of 30 times ".*" followed by a "b", followed by another 30 times ".*" and finally an "x". The time taken by String.matches
to return is exponential on the number of ".*", i.e. it's O(eⁿ). If you replace 30 by 50, it will take more than a minute to return.
This is concerning because I was thinking of having a public HTTP API where the user would be able to enter a regex. If the regex is coming from outside, it is possible that a malicious user could enter a regex that would cause very high CPU on my server for a long time.
Is this a known issue in the Java API and does anyone know of a workaround, i.e. a way to inspect a regex to determine whether it is dangerous to use?