0

I'm trying to compare the performance between the matching with precompiled pattern and a normal matching with regex using the following code. I use JUnit for benchmark testing and it looks like the nonPrecompiled code works faster.

Any ideas why? Do I run the tests in the correct way?

Class code:

import java.util.regex.Pattern;

public class StringSplitter {
    private Pattern spacesPattern;
    private static final String REGEX_STRING = "\\s+";

    public StringSplitter() {
        this.spacesPattern = Pattern.compile(REGEX_STRING);
    }

    public String[] nonPrecompiledTest(String bigS) {
        return bigS.split(REGEX_STRING);

    }

    public String[] precompiledTest(String bigS) {
        return this.spacesPattern.split(bigS);

    }

}

And the tests code:

import org.junit.BeforeClass;
import org.junit.Test;
import com.carrotsearch.junitbenchmarks.AbstractBenchmark;

public class PatternBenchmarkTest extends AbstractBenchmark {

    private static String bigS;
    private static StringSplitter pt;

    @BeforeClass
    public static void prepare() {
        char[] bigString = new char[10000000];
        for (int i = 0; i < bigString.length; i = i + 2) {
            bigString[i] = 'a';
            bigString[i + 1] = ' ';
        }
        bigS = new String(bigString);
        System.out.println("Created big string");

        pt = new StringSplitter();
        System.out.println("Created test class");
    }

    @Test
    public void testNonPrecompiled() {
            pt.nonPrecompiledTest(bigS);
    }

    @Test
    public void testPrecompiled() {
            pt.precompiledTest(bigS);
    }
}

And I get the following result:

Created big string
Created test class
PatternBenchmarkTest.testPrecompiled: [measured 10 out of 15 rounds, threads: 1 (sequential)]
 round: 0.33 [+- 0.04], round.block: 0.00 [+- 0.00], round.gc: 0.00 [+- 0.00], GC.calls: 9, GC.time: 0.76, time.total: 8.83, time.warmup: 5.56, time.bench: 3.27
PatternBenchmarkTest.testNonPrecompiled: [measured 10 out of 15 rounds, threads: 1 (sequential)]
 round: 0.28 [+- 0.04], round.block: 0.00 [+- 0.00], round.gc: 0.00 [+- 0.00], GC.calls: 5, GC.time: 0.27, time.total: 4.37, time.warmup: 1.56, time.bench: 2.82
  • It is not about the benchmarking, but about the floating performance of the regex with or without precompiled patterns. –  May 25 '17 at 22:56
  • But the floating performance is a result of bad benchmarking code, so it *is* about benchmarking. – Andreas May 25 '17 at 22:59
  • So is the precompiled pattern faster? –  May 25 '17 at 23:02
  • You can't run a benchmark this way. You can't compare single system call times and expect it to represent anything more than the snapshot state of the OS running processes. Even though you're not running the same system calls, you're still just seeing the influence of the background processing. You actually have to loop thousands of time's on each call separately, and taking the time before the loop and after. –  May 26 '17 at 01:49
  • 1
    To answer your question as to which is faster though, the precompiled regex doesn't have to be parsed and a state machine made of it, so of course it's faster. Btw, even if you don't have a big regex, just creating the state machine requires significant overhead. Also, why are you using `split()` to benchmark this principle. Split function itself may have gyration overhead difference when creating a regex and it's usage that is beyond just using a regex. –  May 26 '17 at 01:53
  • So I have changed the strategy. Now I use JUnit for benchmarking but it seems like the nonPrecompiled code runs faster. (s. the code above) What would be the correct strategy to test this simple code (acc. to you)? –  May 26 '17 at 09:15

0 Answers0