As others have said, the benchmark was heavily flawed - performance testing of Java code does not work like that - you must warm it up to ensure that all classes have been loaded and parsed, that all objects have been loaded into memory, and that any compiling down to native code, e.g. via HotSpot, has been done. A naïve benchmark where you just run the code once in the main method is not really going to fly. A much better choice is to use something like JMH. Given the following test:
package com.stackoverflow.example;
import org.openjdk.jmh.annotations.Benchmark;
import org.openjdk.jmh.runner.Runner;
import org.openjdk.jmh.runner.RunnerException;
import org.openjdk.jmh.runner.options.Options;
import org.openjdk.jmh.runner.options.OptionsBuilder;
@OutputTimeUnit(TimeUnit.MICROSECONDS)
@Measurement(time = 250, timeUnit = TimeUnit.MILLISECONDS)
public class MyBenchmark {
private static final String[] names = new String[]{"jack", "jackson", "jason", "jadifu"};
public static void main(String[] args) throws RunnerException {
Options opt = new OptionsBuilder()
.include(MyBenchmark.class.getSimpleName())
.forks(1)
.build();
new Runner(opt).run();
}
@Benchmark
public void contains() {
names[0].contains("ja");
}
@Benchmark
public void containsExplicit() {
names[0].indexOf("ja".toString());
}
@Benchmark
public void indexOf() {
names[0].indexOf("ja");
}
@Benchmark
public void matches() {
names[0].matches(".*ja.*");
}
}
I get the following results:
Benchmark Mode Cnt Score Error Units
MyBenchmark.contains thrpt 20 219.770 ± 2.032 ops/us
MyBenchmark.containsExplicit thrpt 20 1820.024 ± 20.583 ops/us
MyBenchmark.indexOf thrpt 20 1828.234 ± 18.744 ops/us
MyBenchmark.matches thrpt 20 3.933 ± 0.052 ops/us
Now, that's fairly interesting, as it still suggests that contains
is significantly slower than indexOf
. However, if I change the test up, very slightly, to the following:
@Benchmark
public void contains() {
assert names[0].contains("ja");
}
@Benchmark
public void containsExplicit() {
assert names[0].indexOf("ja".toString()) == 0;
}
@Benchmark
public void indexOf() {
assert names[0].indexOf("ja") == 0;
}
@Benchmark
public void matches() {
assert names[0].matches(".*ja.*");
}
I get the following results:
Benchmark Mode Cnt Score Error Units
MyBenchmark.contains thrpt 20 220.480 ± 1.266 ops/us
MyBenchmark.containsExplicit thrpt 20 219.962 ± 2.329 ops/us
MyBenchmark.indexOf thrpt 20 219.706 ± 2.401 ops/us
MyBenchmark.matches thrpt 20 3.766 ± 0.026 ops/us
In this, we're getting the same result for contains, but indexOf
has slowed down to match contains
. That's a very interesting result. Why is this happening?
Probably due to HotSpot recognising that the result of the indexOf
call is never inspected, and since it's taking a final
class (String
), HotSpot is likely able to guarantee that there are no side effects to the call. So if we're not looking at the result and there are no side effects to the call, why are we making it? HotSpot is able to realise that a method call is pointless, and remove it altogether, which could be what's happening here. It would certainly explain the order of magnitude difference.
Why doesn't this work for contains
, though? I can only assume that it's because contains
accepts a CharSequence
, not a String
, which is an abstract class, and that's just enough to prevent HotSpot from optimising the method call away.
This also indicates that micro-benchmarks are hard in Java - there is a lot going on beneath the surface to optimise your running code, and a few shortcuts can result in extremely inaccurate benchmarks.