3

Suppose I have a list of regular expressions that I want my input string to match with like :

List<String> regexList = new ArrayList<>();
regexList.add("(.*)apple"); // regex1 - anything ending with apple
regexList.add("smart(.*)"); // regex2 - anything starting with smart

boolean isMatching(input) {
    for (String regex : regexList) {
        if (input.matches(regex)) return true;
    }
    return false;
}

OR

String regexCombined = "((.*)apple)|(smart(.*))";

boolean isMatching(input) {
    return input.matches(regexCombined);
}

Now if have suppose N regex to handle. What will be the time complexities of both the approaches?

dushyantashu
  • 523
  • 3
  • 9
  • 16
  • Have you tred running the code and doing some benchmarks? – Mad Physicist Oct 21 '16 at 12:14
  • I suppose loop of `startsWith` and `endsWith` is much faster. –  Oct 21 '16 at 12:16
  • Without having done any measurement I would guess the combined regexp would take longer to initialize the matcher but the actual matching is faster than with the loop. What is actually overall faster depends also on the length of the input. – Henry Oct 21 '16 at 12:21
  • Possible duplicate of [What is the complexity of regular expression?](http://stackoverflow.com/questions/4378455/what-is-the-complexity-of-regular-expression) – Addison Oct 21 '16 at 12:58
  • Java will have to compile the Regex every time you call `.matches()`, since you are using strings instead of a [`Pattern`](https://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html), which is only compiled once. However, this is only relevant if you're looking to make it faster - rather than just curious about the time complexity. – Addison Oct 21 '16 at 12:59
  • @Addison Lets say If I use pattern list instead of string list. How would a single ORed pattern perform in comparison with list of patterns? I'm mostly curious about the comparison of two. – dushyantashu Oct 21 '16 at 13:25
  • Sorry for the late reply. I'm not sure how it would compare, since regex engines all function differently. This site (https://swtch.com/~rsc/regexp/regexp1.html) explains how they can differ wildly. But I would say that a combined regular expression will *always* be faster than multiple, if you consider a state machine (which it probably uses) – Addison Oct 21 '16 at 15:24

1 Answers1

0

Both approaches will have complexity of O(N*W*k) where:

  • N is the number of regexes
  • W is the largest complexity among all regexes (in Java regexes can have at most exponential complexity, see here)
  • k is the number of strings you are testing

The run time might differ between the approaches. OR'ing the regexes together would have better performance if the Java regex engine optimizes this case in a meaningful way. If performance is critical, see here for another idea.

James Davis
  • 848
  • 8
  • 12