A more reasonable benchmark would be:
public abstract class Benchmark {
final String name;
public Benchmark(String name) {
this.name = name;
}
abstract int run(int iterations) throws Throwable;
private BigDecimal time() {
try {
int nextI = 1;
int i;
long duration;
do {
i = nextI;
long start = System.nanoTime();
run(i);
duration = System.nanoTime() - start;
nextI = (i << 1) | 1;
} while (duration < 1000000000 && nextI > 0);
return new BigDecimal((duration) * 1000 / i).movePointLeft(3);
} catch (Throwable e) {
throw new RuntimeException(e);
}
}
@Override
public String toString() {
return name + "\t" + time() + " ns";
}
private static void shuffle(int[] a) {
Random chaos = new Random();
for (int i = a.length; i > 0; i--) {
int r = chaos.nextInt(i);
int t = a[r];
a[r] = a[i - 1];
a[i - 1] = t;
}
}
public static void main(String[] args) throws Exception {
final int[] table = new int[1000];
final int[] permutation = new int[1000];
for (int i = 0; i < table.length; i++) {
table[i] = i * i;
permutation[i] = i;
}
shuffle(permutation);
Benchmark[] marks = {
new Benchmark("sequential multiply") {
@Override
int run(int iterations) throws Throwable {
int sum = 0;
for (int j = 0; j < iterations; j++) {
for (int i = 0; i < table.length; i++) {
sum += i * i;
}
}
return sum;
}
},
new Benchmark("sequential lookup") {
@Override
int run(int iterations) throws Throwable {
int sum = 0;
for (int j = 0; j < iterations; j++) {
for (int i = 0; i < table.length; i++) {
sum += table[i];
}
}
return sum;
}
},
new Benchmark("random order multiply") {
@Override
int run(int iterations) throws Throwable {
int sum = 0;
for (int j = 0; j < iterations; j++) {
for (int i = 0; i < table.length; i++) {
sum += permutation[i] * permutation[i];
}
}
return sum;
}
},
new Benchmark("random order lookup") {
@Override
int run(int iterations) throws Throwable {
int sum = 0;
for (int j = 0; j < iterations; j++) {
for (int i = 0; i < table.length; i++) {
sum += table[permutation[i]];
}
}
return sum;
}
}
};
for (Benchmark mark : marks) {
System.out.println(mark);
}
}
}
which prints on my intel core duo (yes, it's old):
sequential multiply 2218.666 ns
sequential lookup 1081.220 ns
random order multiply 2416.923 ns
random order lookup 2351.293 ns
So, if I access the lookup array sequentially, which minimizes the number of cache misses, and permits the hotspot JVM to optimize bounds checking on array access, there is a slight improvement on an array of 1000 elements. If we do random access into the array, that advantage disappears. Also, if the table is larger, the lookup gets slower. For instance, for 10000 elements, I get:
sequential multiply 23192.236 ns
sequential lookup 12701.695 ns
random order multiply 24459.697 ns
random order lookup 31595.523 ns
So, array lookup is not faster than multiplication, unless the access pattern is (nearly) sequential and the lookup array small.
In any case, my measurements indicate that a multiplication (and addition) takes merely 4 processor cycles (2.3 ns per loop iteration on a 2GHz CPU). You're unlikely to get much faster than that. Also, unless you do half a billion multiplications per second, the multiplications are not your bottleneck, and optimizing other parts of the code will be more fruitful.