Inspired by this question: Why is it faster to process a sorted array than an unsorted array?
I wrote my own branch prediction experiment:
public class BranchPrediction {
public static void main(final String[] args) {
long start;
long sum = 0;
/* No branch */
start = System.nanoTime();
sum = 0;
for (long i = 0; i < 10000000000L; ++i)
sum += i;
System.out.println(System.nanoTime() - start);
System.out.println(sum);
/* With branch */
start = System.nanoTime();
sum = 0;
for (long i = 0; i < 10000000000L; ++i)
if (i >= 0)
sum += i;
System.out.println(System.nanoTime() - start);
System.out.println(sum);
/* No branch (again) */
start = System.nanoTime();
sum = 0;
for (long i = 0; i < 10000000000L; ++i)
sum += i;
System.out.println(System.nanoTime() - start);
System.out.println(sum);
/* With branch (again) */
start = System.nanoTime();
sum = 0;
for (long i = 0; i < 10000000000L; ++i)
if (i >= 0)
sum += i;
System.out.println(System.nanoTime() - start);
System.out.println(sum);
}
}
The result confuses me: according to program output, the loop with a branch is reliably faster than no branch loops.
Example output:
7949691477
-5340232226128654848
6947699555
-5340232226128654848
7920972795
-5340232226128654848
7055459799
-5340232226128654848
Why is it so?
Edit:
- Disassembled class shows Java compiler did not optimize (miss) anything (https://gist.github.com/HouzuoGuo/5692424)
- The Java benchmark technique used by author of Why is it faster to process a sorted array than an unsorted array? is the same as mine.
- The machine is an Intel core i7, running Linux 3.2 64-bit and Oracle JVM 1.7 64-bit
- When I supersize the number of loop iterations, with-branch loop runs multi-SECONDS faster than non-branch loop.