0

Say I have this code:

public static boolean checkRegex(String regex, String value) {
        if (value == null) return false;
        Pattern pattern = Pattern.compile(regex);
        Matcher matcher = pattern.matcher(value);
        return matcher.matches();
    }
.......

checkRegex("^.+@.+\\..+$", "email@example.com");

will my regex compile once at compile time or compile multiple times at run time?

Moshe Shaham
  • 15,448
  • 22
  • 74
  • 114
  • 3
    Why do you think it might only compile once? – Sotirios Delimanolis Jun 17 '15 at 19:28
  • will compile multiple times. And this won't need pattern or matcher classes. – Avinash Raj Jun 17 '15 at 19:29
  • @SotiriosDelimanolis: from a theoretical pov, it is possible because of constant propagation; but here the chances are great that there are too much side effects anyway. – Willem Van Onsem Jun 17 '15 at 19:37
  • @CommuSoft This is an argument in a public method. Impossible to compile at compile time. – laune Jun 17 '15 at 19:38
  • @laune: well you can specialize the particular `checkRegex` call at the bottom. This is a technique frequently used in the context of both logic and functional programming. In other words, you create a second method asides `checkRegex` that only takes for instance one parameter,... With the first commands omitted since these can be precomputed. – Willem Van Onsem Jun 17 '15 at 19:39
  • @SotiriosDelimanolis: technically speaking it only *fully* compiles once, because of the used *flyweight pattern* by `java.util.regex.*`. – Willem Van Onsem Jun 17 '15 at 19:56
  • @CommuSoft Only if call and method are in the same compilation unit, which is left open. – laune Jun 17 '15 at 20:02
  • You might want to review your accepted answer and its comments. – Sotirios Delimanolis Jun 24 '15 at 04:57

2 Answers2

4

Pattern.compile(regex) is just a method with no special meaning to the compiler, so it can be executed only at runtime. While it is possible to cache the compiled pattern (which is done in other languages, such as Python), Sun/Oracle's implementation of Pattern doesn't cache the compiled Pattern, which means each time you execute checkRegex method (even with the same regex), you compile the regex all over again and get a new Pattern instance.

By the way, your method

public static boolean checkRegex(String regex, String value) {
    if (value == null) return false;
    Pattern pattern = Pattern.compile(regex);
    Matcher matcher = pattern.matcher(value);
    return matcher.matches();
}

can be rewritten as

public static boolean checkRegex(String regex, String value) {
    if (value == null) 
        return false;
    return value.matches(regex);
}
nhahtdh
  • 55,989
  • 15
  • 126
  • 162
Pshemo
  • 122,468
  • 25
  • 185
  • 269
1

Short answer

Although perhaps possible from a theoretical point of view, no; not at compile time, these are compiled again.

But Pattern uses a Flyweight pattern, such that once a regex is compiled, it is stored in memory, so full compilation at runtime is not necessary.

Compile time

From a theoretical point of view, it is possible that the compiler will perform what is known as constant propagation and thus resolve the problem at compile time. This can be done given the methods you call are final (or the callee is known at compile time), etc.

If one however compiles your method and inspects the Java bytecode, it compiles to:

public static boolean checkRegex(java.lang.String, java.lang.String);
    Code:
       0: aload_1       
       1: ifnonnull     6
       4: iconst_0      
       5: ireturn       
       6: aload_0       
       7: invokestatic  #15                 // Method java/util/regex/Pattern.compile:(Ljava/lang/String;)Ljava/util/regex/Pattern;
      10: astore_2      
      11: aload_2       
      12: aload_1       
      13: invokevirtual #16                 // Method java/util/regex/Pattern.matcher:(Ljava/lang/CharSequence;)Ljava/util/regex/Matcher;
      16: astore_3      
      17: aload_3       
      18: invokevirtual #17                 // Method java/util/regex/Matcher.matches:()Z
      21: ireturn       

  public static void main(java.lang.String[]);
    Code:
       0: ldc           #18                 // String ^.+@.+\..+$
       2: ldc           #19                 // String email@example.com
       4: invokestatic  #20                 // Method checkRegex:(Ljava/lang/String;Ljava/lang/String;)Z
       7: pop
       ...

As you can perhaps see, the methods are simply translated into Java byte code and the method is called with the two parameters.

Some compilers (mostly ones for functional and logic programming languages) allow program specialization: if a certain call is done with a constant, the compiler can create a copy of that method and resolve what is known at compile time. It is however a trade-off, since introducing a large number of specialized methods, will reduce in a large code base.

Flyweight pattern at runtime

If you dig into the Java bytecode of java.util.regex.Pattern, you will see:

private static final HashMap<String, CharPropertyFactory> map;

static CharProperty charPropertyFor(String name) {
        // <editor-fold defaultstate="collapsed" desc="Compiled Code">
        /* 0: getstatic     java/util/regex/Pattern$CharPropertyNames.map:Ljava/util/HashMap;
         * 3: aload_0
         * 4: invokevirtual java/util/HashMap.get:(Ljava/lang/Object;)Ljava/lang/Object;
         * 7: checkcast     java/util/regex/Pattern$CharPropertyNames$CharPropertyFactory
         * 10: astore_1
         * 11: aload_1
         * 12: ifnonnull     19
         * 15: aconst_null
         * 16: goto          23
         * 19: aload_1
         * 20: invokevirtual java/util/regex/Pattern$CharPropertyNames$CharPropertyFactory.make:()Ljava/util/regex/Pattern$CharProperty;
         * 23: areturn
         *  */
        // </editor-fold>
    }

Notice the HashMap, so this means Pattern stores fragments of the regex and their corresponding micro-DFA, this behavior is known as a Flyweight. This means you only compile the regex once. Evidently you will still have to perform lookups, so this is no free optimization, but it will definitely help.

Community
  • 1
  • 1
Willem Van Onsem
  • 443,496
  • 30
  • 428
  • 555
  • I understand now. I was under the false impression that the compiler makes an optimization when it encounters Pattern.compile() with a string expression. I wasn't sure if it's doing it to variables.... – Moshe Shaham Jun 17 '15 at 20:32
  • 1
    `This means you only compile the regex once.` No, the code you cite is from `CharPropertyNames` class, which serves as a **factory** for some property `\p` constructs (it stores node templates which are cloned when requested). Pattern does not cache the string and the resulting compiled object at all. – nhahtdh Jun 18 '15 at 03:08
  • Btw, the SO post you cited describes the fact that Pattern and Matcher are decoupled, so a compiled regex can be reused across different Matchers. The programmer, however, is responsible for managing the compiled Pattern. – nhahtdh Jun 18 '15 at 03:21
  • By the way, here is the source code of [Pattern.compile(String) (Java 8)](http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/8-b132/java/util/regex/Pattern.java#Pattern.compile%28java.lang.String%29), you will eventually arrive at private method [Pattern.compile()](http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/8-b132/java/util/regex/Pattern.java#Pattern.compile%28%29), which just go ahead and compile the regex. Pattern consists of Node objects, which are chained together by `next` field, so you can't reuse the Node objects, with very few exceptions. – nhahtdh Jun 18 '15 at 03:38