Why not using hashCode() under the hood of equals() to pre-check for equality first?
Quick draft tests:
@Fork(value = 1)
@Warmup(time = 1)
@Measurement(time = 1)
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@State(Scope.Benchmark)
public class Main {
@Param({
"o", // differ size
"oooooooooooooooooo1", // same size, differ last symbol
"oooooooooooooooooo2" // same content
})
String string1;
@Param({
"oooooooooooooooooo2"
})
String string2;
@Benchmark
public void stringEquals(Blackhole bh) {
bh.consume(string1.equals(string2));
}
@Benchmark
public void myEquals(Blackhole bh) {
bh.consume(myEquals(string1, string2));
}
boolean myEquals(String str1, String str2){
if (str1.hashCode()==str2.hashCode()) {
return str1.equals(str2);
}
return false;
}
}
Results:
Benchmark (string1) (string2) Mode Cnt Score Error Units
Main.myEquals o oooooooooooooooooo2 avgt 5 5.552 ± 0.094 ns/op
Main.myEquals oooooooooooooooooo1 oooooooooooooooooo2 avgt 5 5.626 ± 0.173 ns/op
Main.myEquals oooooooooooooooooo2 oooooooooooooooooo2 avgt 5 14.347 ± 0.234 ns/op
Main.stringEquals o oooooooooooooooooo2 avgt 5 6.441 ± 1.076 ns/op
Main.stringEquals oooooooooooooooooo1 oooooooooooooooooo2 avgt 5 13.596 ± 0.348 ns/op
Main.stringEquals oooooooooooooooooo2 oooooooooooooooooo2 avgt 5 13.663 ± 0.126 ns/op
As you can see we got great speedup for the case of "same size, differ last symbol".
I think under the hood of String.equals()
check for hashCode()
equality should replace check for length()
equality as it takes the same time:
@Benchmark
public void emptyTest(Blackhole bh) {
bh.consume(0);
}
@Benchmark
public void stringLength(Blackhole bh) {
bh.consume(string2.length());
}
@Benchmark
public void stringHashCode(Blackhole bh) {
bh.consume(string2.hashCode());
}
Benchmark (string2) Mode Cnt Score Error Units
Main.emptyTest oooooooooooooooooo2 avgt 5 3.702 ± 0.086 ns/op
Main.stringHashCode oooooooooooooooooo2 avgt 5 4.832 ± 0.421 ns/op
Main.stringLength oooooooooooooooooo2 avgt 5 5.175 ± 0.156 ns/op
PS I have a feeling my measurements method might be wrong, so any comments are welcome. Also, the hash is saved inside String and that also might produce some misleading results...
UPD1: As @AdamSiemion mentioned we need to recreate string every time a benchmarked method is called to avoid cashing of hash code:
String str1, str2;
@Setup(value = Level.Invocation)
public void setup(){
str1 = string1;
str2 = string2;
}
@Benchmark
public void stringEquals(Blackhole bh) {
bh.consume(str1.equals(str2));
}
@Benchmark
public void myEquals(Blackhole bh) {
bh.consume(myEquals(str1, str2));
}
Benchmark (string1) (string2) Mode Cnt Score Error Units
Main.myEquals o oooooooooooooooooo2 avgt 5 29.417 ± 1.430 ns/op
Main.myEquals oooooooooooooooooo1 oooooooooooooooooo2 avgt 5 29.635 ± 2.053 ns/op
Main.myEquals oooooooooooooooooo2 oooooooooooooooooo2 avgt 5 37.628 ± 0.974 ns/op
Main.stringEquals o oooooooooooooooooo2 avgt 5 29.905 ± 2.530 ns/op
Main.stringEquals oooooooooooooooooo1 oooooooooooooooooo2 avgt 5 38.090 ± 2.933 ns/op
Main.stringEquals oooooooooooooooooo2 oooooooooooooooooo2 avgt 5 36.966 ± 1.642 ns/op
So, we still have almost 30% speedup for the case of "same size, differ last symbol".
UPD2 As @DanielPryden mentioned str1 = string1
will not create new String. So we need to explicitly do so:
@Setup(value = Level.Invocation)
public void setup(){
str1 = new String(string1);
str2 = new String(string2);
}
Benchmark (string1) (string2) Mode Cnt Score Error Units
Main.myEquals o oooooooooooooooooo2 avgt 5 61.662 ± 3.068 ns/op
Main.myEquals oooooooooooooooooo1 oooooooooooooooooo2 avgt 5 85.761 ± 7.766 ns/op
Main.myEquals oooooooooooooooooo2 oooooooooooooooooo2 avgt 5 92.156 ± 8.851 ns/op
Main.stringEquals o oooooooooooooooooo2 avgt 5 30.789 ± 0.731 ns/op
Main.stringEquals oooooooooooooooooo1 oooooooooooooooooo2 avgt 5 38.602 ± 1.212 ns/op
Main.stringEquals oooooooooooooooooo2 oooooooooooooooooo2 avgt 5 38.921 ± 1.816 ns/op
So, now we have what was expected: using hashCode()
will always be slower then equals()
. And that has a total sense (as @Carcigenicate mentioned in comments below): hashCode()
need to do full traversal through char[] to produce the hash. I thought it might be some intrinsic under the hood of hashCode()
that make it faster, but it has not.
Therefore, it's still possible to get some speed up of equals()
if make a check for precalculated hash
existence and compare them:
public boolean equals(Object anObject) {
if (this == anObject) {
return true;
}
if (anObject instanceof String) {
String anotherString = (String)anObject;
int n = value.length;
if (n == anotherString.value.length
// new code begins
&& (hash==0 || anotherString.hash==0 || hash==anotherString.hash)) {
// new code ends
char v1[] = value;
char v2[] = anotherString.value;
int i = 0;
while (n-- != 0) {
if (v1[i] != v2[i])
return false;
i++;
}
return true;
}
}
return false;
}
We'll get some small(?) slowdown in case of equal strings (for checking hash
fields), but also will get a speedup in case of strings with the same length but different content and already precalculated hashes.
Unfortunately, I can't test it as I can't change the source code of String class.