find all letters in String with regex

Question

I know toCharArray() method but I am interested in regex. I have question for you about speed of two regex:

    String s = "123456";
    // Warm up JVM
    for (int i = 0; i < 10000000; ++i) {
        String[] arr = s.split("(?!^)");
        String[] arr2 = s.split("(?<=\\G.{1})");
    }
    long start = System.nanoTime();
    String[] arr = s.split("(?!^)");
    long stop = System.nanoTime();
    System.out.println(stop - start);
    System.out.println(Arrays.toString(arr));
    start = System.nanoTime();
    String[] arr2 = s.split("(?<=\\G.{1})");
    stop = System.nanoTime();
    System.out.println(stop - start);
    System.out.println(Arrays.toString(arr2));

output:

Run 1:
3158
[1, 2, 3, 4, 5, 6]
3947
[1, 2, 3, 4, 5, 6]

Run 2: 
2763
[1, 2, 3, 4, 5, 6]
3158
[1, 2, 3, 4, 5, 6]

two regex are doing the same job. Why the first regex is more faster than second one ? . Thanks for your answers.

Take a look at [how-do-i-write-a-correct-micro-benchmark-in-java](http://stackoverflow.com/questions/504103/how-do-i-write-a-correct-micro-benchmark-in-java) — Pshemo, Sep 21 '13 at 14:10
thanks for your interest.You are regex fan :) Do you know which regex is faster and explain why ? — Melih Altıntaş, Sep 21 '13 at 14:17
I am not sure if one is much faster then other because I get different results each time I execute your code. I would suspect that `Pattern.compile` can optimize both your regexes to very similar one, and one of reasons for any difference is time of that optimization since second one is bit more complex. — Pshemo, Sep 21 '13 at 14:31

score 3 · Accepted Answer · answered Sep 21 '13 at 14:37

I can never be 100% sure, but I can think of one reason.

(?!^) always fails or succeeds in one shot (one attempt), that is if it can't find the start-of-string which is just a single test.

As for (?<=\\G.{1}) (which is exactly equivalent to just (?<=\\G.)) it always involved two steps or two matching attempts.

\\G matches either at the start-of-string or at the end of previous match, and even when it is successful, the regex engine still has to try and match a single character ..

For example, in your string 123456, at the start of the string:

(?!^): fails immediately.
(?<=\\G.): \\G succeeds, but then it looks for . but can't find a character behind because this is the start-of-string so now it fails, but as you can see it attempted two steps versus one step for the previous expression.

The same goes for every other position in the input string. Always two tests for (?<=\\G.) versus a single test for (?!^).

Thanks for your answer if I don't put \\G which one you select s.split("(?<=.)") or s.split("(?!^)"); — Melih Altıntaş, Sep 21 '13 at 14:45
@MelihAltıntaş The following answer is a guess too, you need to run a benchmark to be sure but I think `(?<=^)` is faster than `(?<=.)` because the latter matches everything except a newline `\n` so it has to check for it, nevertheless I think the difference is very very small. — Ibrahim Najjar, Sep 21 '13 at 14:49

find all letters in String with regex

1 Answers1