11

I think String.indexOf(char) is a little faster than String.indexOf(String) when using single character & single String(ex, 'x' & "x")

To make sure my guessing, I wrote easy test code like below.

public static void main(String[] args) {
    IndexOfTest test = new IndexOfTest(Integer.parseInt(args[0]));

    test.run();
}

public IndexOfTest(int loop) {
    this.loop = loop;
}

public void run() {
    long start, end;
    start = System.currentTimeMillis();
    for(int i = 0 ; i < loop ; i++) {
        alphabet.indexOf("x");
    }
    end = System.currentTimeMillis();
    System.out.println("indexOf(String) : " + (end - start) + "ms");

    start = System.currentTimeMillis();
    for(int i = 0 ; i < loop ; i++) {
        alphabet.indexOf('x');
    }
    end = System.currentTimeMillis();
    System.out.println("indexOf(char) : " + (end - start) + "ms");

}

alphabet is String variable that has "abcd...xyzABCD...XYZ".

from this code, I got result table like this...

loop     10^3  10^4  10^5  10^6  10^7

String      1     7     8     9     9

char        1     2     5    10    64

String.indexOf(String) looks like converge to 9ms, however String.indexOf(char) increases exponentially.

I'm very confused. Is there any optimization for using String in this case? Or how I figure out this result?


Update

I ran jmh with below two benchmark method. Each method calls a indexOf method.

@State(Scope.Thread)
public class MyBenchmark {
    private String alphabet = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ";

    @Benchmark
    public void indexOfString() {
        alphabet.indexOf("x");
    }

    @Benchmark
    public void indexOfChar() {
    alphabet.indexOf('x');
    }
}

result:

Benchmark                   Mode  Cnt           Score        Error  Units
MyBenchmark.indexOfChar    thrpt   30   142106399.525 ±  51360.808  ops/s
MyBenchmark.indexOfString  thrpt   30  2178872840.575 ± 864573.421  ops/s

This result also show indexOf(String) is faster..

I think that it is time to think about hidden optimization

Any idea?

Chirlo
  • 5,989
  • 1
  • 29
  • 45
parivana
  • 299
  • 2
  • 10
  • 6
    http://stackoverflow.com/questions/504103/how-do-i-write-a-correct-micro-benchmark-in-java – assylias Nov 11 '15 at 08:37
  • such benchmarks are always doubtful - but the two methods have actually different implementations, so it might be that one is slightly better than the other (I dont know though) – Emerson Cod Nov 11 '15 at 08:49
  • First steps: use System.nanotime() instead of currentMillis and use the results (for example by adding them into an `int sum` or something of the like). Better: use jmh. – assylias Nov 11 '15 at 09:01

1 Answers1

20

Your JMH test is incorrect as you don't consume the result, so the indexOf call can be (or can be not) removed at all by JIT compiler. In your case it seems that JIT-compiler determined that indexOf(String) has no side-effect and removed this call at all, but did not do the same for indexOf(char). Always consume the result (the simplest way is to return it from the benchmark). Here's my version:

import java.util.*;
import java.util.concurrent.TimeUnit;

import org.openjdk.jmh.annotations.*;

@State(Scope.Benchmark)
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@Warmup(iterations = 5, time = 500, timeUnit = TimeUnit.MILLISECONDS)
@Measurement(iterations = 10, time = 500, timeUnit = TimeUnit.MILLISECONDS)
@Fork(3)
public class IndexOfTest { 
    private String str;
    private char c;
    private String s;

    @Setup
    public void setup() {
        str = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz";
        c = 'z';
        s = "z";
    }

    @Benchmark
    public int indexOfChar() {
        return str.indexOf('z');
    }

    @Benchmark
    public int indexOfString() {
        return str.indexOf("z");
    }

    @Benchmark
    public int indexOfCharIndirect() {
        return str.indexOf(c);
    }

    @Benchmark
    public int indexOfStringIndirect() {
        return str.indexOf(s);
    }
}

I test the same thing, but added two indirect tests: when searched char or String is loaded from the field, thus its exact value is unknown during the JIT-compilation. The results are the following (Intel x64):

# JMH 1.11.2 (released 27 days ago)
# VM version: JDK 1.8.0_45, VM 25.45-b02
Benchmark                          Mode  Cnt   Score   Error  Units
IndexOfTest.indexOfChar            avgt   30  25,364 ± 0,424  ns/op
IndexOfTest.indexOfCharIndirect    avgt   30  25,287 ± 0,210  ns/op
IndexOfTest.indexOfString          avgt   30  24,370 ± 0,100  ns/op
IndexOfTest.indexOfStringIndirect  avgt   30  27,198 ± 0,048  ns/op

As you can see, indexOfChar performs in the same way regardless of direct or indirect access. The indexOfString is slightly faster for direct access, but somewhat slower for indirect. That's because indexOf(String) is a JVM intrinsic: its Java code is actually replaced by JIT compiler with efficient inline implementation. For constant string known at JIT compilation time it's possible to generate more efficient code.

In general there's no big difference at least for such short strings. Thus you may use either of these methods for single symbol match.

Tagir Valeev
  • 97,161
  • 19
  • 222
  • 334
  • Just curious, any idea why would the compiler think that `indexOf(String)` has no side effects and not think the same with `indexOf(char)`? – user3367701 Nov 25 '15 at 02:19
  • 1
    I believe it is JIT compiler doing such work, not the compiler. I do remember there are some doc somewhere mentioning what optimization JIT will do for each version of JRE. Sorry cannot point the exact location to you :P – Adrian Shum Nov 25 '15 at 02:32
  • 1
    @user3367701, I think that's because `indexOf(String)` is JIT intrinsic, so JIT just knows in advance everything about this method (including the fact it has no side effect). However `indexOf(char)` is a normal method and JIT should analyze its code to determine whether or not it has the side effect. – Tagir Valeev Nov 25 '15 at 03:16
  • "For constant string known at JIT compilation time it's possible to generate more efficient code." Does this mean that indexOf(static final field) would be faster than indexOf(char) ? – David Mar 23 '17 at 06:57
  • @David, probably so, but may depend on JVM version and may change in future. I would not rely on this fact. – Tagir Valeev Mar 23 '17 at 14:47
  • 2
    @David when I try to dive into a JVM developer’s mind, I’d conclude that the main reason to intrinsify `indexOf(String)` would be the opportunity to do the preparation work for the [Boyer–Moore algorithm](https://en.wikipedia.org/wiki/Boyer%E2%80%93Moore_string-search_algorithm) at a call-site with a constant argument and keep it for subsequent calls. For an unpredictable, potentially varying argument, it’s not clear whether such preparation work will pay off. And for a single char string, there is no advantage anyway. So the performance difference of this particular case is only a side effect – Holger Apr 20 '20 at 16:18
  • Definitely depends on the version of Java. The current Java 16 pre-release is faster for indexOfChar than indexOfString. OpenJ9 is roughly the same for both. Hotspot versions _before_ Java 16 are generally slightly faster for the String version. – Paul Wagland Jan 17 '21 at 00:16