I included a counter in order to check every n reads of charAt, in order to reduce the overhead.
Notes:
Some people stated that carAt may not be call frequently enough. I just added the foo variable in order to demostrate how much charAt is called, and that it is frequent enough. If you're going to use this in production, remove that counter, as it will decrease performance and end up overflowing long if ran in a server for long time. In this example, charAt is called 30 million times every 0.8 secs or so (not tested with proper microbenchmarking conditions, it is just a proof of concept). You can set a lower checkInterval if you want higher precission, at the cost of performance (System.currentTimeMillis() > timeoutTime is more expensive than the if clause on the long run.
import java.util.regex.Matcher;
import java.util.regex.Pattern;
import com.goikosoft.test.RegexpTimeoutException;
/**
* Allows to create timeoutable regular expressions.
*
* Limitations: Can only throw RuntimeException. Decreases performance.
*
* Posted by Kris in stackoverflow.
*
* Modified by dgoiko to ejecute timeout check only every n chars.
* Now timeout < 0 means no timeout.
*
* @author Kris https://stackoverflow.com/a/910798/9465588
*
*/
public class RegularExpressionUtils {
public static long foo = 0;
// demonstrates behavior for regular expression running into catastrophic backtracking for given input
public static void main(String[] args) {
long millis = System.currentTimeMillis();
// This checkInterval produces a < 500 ms delay. Higher checkInterval will produce higher delays on timeout.
Matcher matcher = createMatcherWithTimeout(
"xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx", "(x+x+)+y", 10000, 30000000);
try {
System.out.println(matcher.matches());
} catch (RuntimeException e) {
System.out.println("Operation timed out after " + (System.currentTimeMillis() - millis) + " milliseconds");
}
System.out.print(foo);
}
public static Matcher createMatcherWithTimeout(String stringToMatch, String regularExpression, long timeoutMillis,
int checkInterval) {
Pattern pattern = Pattern.compile(regularExpression);
return createMatcherWithTimeout(stringToMatch, pattern, timeoutMillis, checkInterval);
}
public static Matcher createMatcherWithTimeout(String stringToMatch, Pattern regularExpressionPattern,
long timeoutMillis, int checkInterval) {
if (timeoutMillis < 0) {
return regularExpressionPattern.matcher(stringToMatch);
}
CharSequence charSequence = new TimeoutRegexCharSequence(stringToMatch, timeoutMillis, stringToMatch,
regularExpressionPattern.pattern(), checkInterval);
return regularExpressionPattern.matcher(charSequence);
}
private static class TimeoutRegexCharSequence implements CharSequence {
private final CharSequence inner;
private final long timeoutMillis;
private final long timeoutTime;
private final String stringToMatch;
private final String regularExpression;
private int checkInterval;
private int attemps;
TimeoutRegexCharSequence(CharSequence inner, long timeoutMillis, String stringToMatch,
String regularExpression, int checkInterval) {
super();
this.inner = inner;
this.timeoutMillis = timeoutMillis;
this.stringToMatch = stringToMatch;
this.regularExpression = regularExpression;
timeoutTime = System.currentTimeMillis() + timeoutMillis;
this.checkInterval = checkInterval;
this.attemps = 0;
}
public char charAt(int index) {
if (this.attemps == this.checkInterval) {
foo++;
if (System.currentTimeMillis() > timeoutTime) {
throw new RegexpTimeoutException(regularExpression, stringToMatch, timeoutMillis);
}
this.attemps = 0;
} else {
this.attemps++;
}
return inner.charAt(index);
}
public int length() {
return inner.length();
}
public CharSequence subSequence(int start, int end) {
return new TimeoutRegexCharSequence(inner.subSequence(start, end), timeoutMillis, stringToMatch,
regularExpression, checkInterval);
}
@Override
public String toString() {
return inner.toString();
}
}
}
And the custom exception, so you can catch only THAT exception to avoid swalowing other RE Pattern / Matcher may throw.
public class RegexpTimeoutException extends RuntimeException {
private static final long serialVersionUID = 6437153127902393756L;
private final String regularExpression;
private final String stringToMatch;
private final long timeoutMillis;
public RegexpTimeoutException() {
super();
regularExpression = null;
stringToMatch = null;
timeoutMillis = 0;
}
public RegexpTimeoutException(String message, Throwable cause) {
super(message, cause);
regularExpression = null;
stringToMatch = null;
timeoutMillis = 0;
}
public RegexpTimeoutException(String message) {
super(message);
regularExpression = null;
stringToMatch = null;
timeoutMillis = 0;
}
public RegexpTimeoutException(Throwable cause) {
super(cause);
regularExpression = null;
stringToMatch = null;
timeoutMillis = 0;
}
public RegexpTimeoutException(String regularExpression, String stringToMatch, long timeoutMillis) {
super("Timeout occurred after " + timeoutMillis + "ms while processing regular expression '"
+ regularExpression + "' on input '" + stringToMatch + "'!");
this.regularExpression = regularExpression;
this.stringToMatch = stringToMatch;
this.timeoutMillis = timeoutMillis;
}
public String getRegularExpression() {
return regularExpression;
}
public String getStringToMatch() {
return stringToMatch;
}
public long getTimeoutMillis() {
return timeoutMillis;
}
}
Based on Andreas' answer. Main credits should go for him and his source.