I was having fun testing some code with Java, and as I was timing different segments of code, I came across an odd behavior. Repeating a simple segment of code inside a loop, for example 10 times, is significantly faster than literally copy-pasting the same segment 10 times:
Note that I followed the advice from this post about benchmarking in Java, and this code was run with the parameters -XX:+PrintCompilation
and -verbose:gc
.
//here max i = 3 (for shorter code), but I actually ran it with max i = 10
public static void main(String[] args) {
final long[] list1 = new long[100000000];
Random r = new Random();
long time;
for(int i = 0 ; i < list1.length ; i++) list1[i] = r.nextLong();
System.out.println("\n\nInitialization Done\n\n");
for(int i = 0 ; i < 3 ; i++) {
System.out.println("###Start###");
time = System.nanoTime();
for(long l : list1) {long n = l*65356;}
System.out.println("Time:\t" + (System.nanoTime()-time) + "\t");
System.out.println("####End####\n");
}
System.out.println("\n\nLoop Done\n\n");
System.out.println("###Start###");
time = System.nanoTime();
for(long l : list1) {long n = l*65356;}
System.out.println("Time:\t" + (System.nanoTime()-time) + "\t");
System.out.println("####End####\n");
System.out.println("###Start###");
time = System.nanoTime();
for(long l : list1) {long n = l*65356;}
System.out.println("Time:\t" + (System.nanoTime()-time) + "\t");
System.out.println("####End####\n");
System.out.println("###Start###");
time = System.nanoTime();
for(long l : list1) {long n = l*65356;}
System.out.println("Time:\t" + (System.nanoTime()-time) + "\t");
System.out.println("####End####\n");
}
And here are the surprising results plotted in a graph:
Console Output:
//I removed a whole bunch of initialization before this point
Initialization Done
###Start###
2926 32 % 3 EmptySlate::main @ 76 (1049 bytes)
2933 33 % 4 EmptySlate::main @ 76 (1049 bytes)
2935 32 % 3 EmptySlate::main @ -2 (1049 bytes) made not entrant
2935 33 % 4 EmptySlate::main @ -2 (1049 bytes) made not entrant
Time: 9570242
####End####
###Start###
2936 34 % 3 EmptySlate::main @ 76 (1049 bytes)
2945 35 % 4 EmptySlate::main @ 76 (1049 bytes)
2951 34 % 3 EmptySlate::main @ -2 (1049 bytes) made not entrant
Time: 15950404
####End####
###Start###
Time: 893
####End####
2952
36 1 java.nio.Buffer::position###Start### (5 bytes)
Time: 446
####End####
###Start### 2952 37 1 java.nio.Buffer::limit
(5 bytes)
Time: 892
####End####
###Start###
Time: 447
####End####
###Start###
Time: 446
####End####
###Start###
Time: 446
####End####
###Start###
Time: 447
####End####
###Start### 2952 38 3 java.nio.ByteBuffer::arrayOffset (35 bytes)
Time: 1785
####End####
Loop Done
###Start###
2952 35 % 4 EmptySlate::main @ -2 (1049 bytes) made not entrant
2952 39 3 java.nio.Buffer::position (43 bytes)
2952 40 % 3 EmptySlate::main @ 183 (1049 bytes)
2961 41 % 4 EmptySlate::main @ 183 (1049 bytes)
2963 40 % 3 EmptySlate::main @ -2 (1049 bytes) made not entrant
2963 41 % 4 EmptySlate::main @ -2 (1049 bytes) made not entrant
Time: 10607276
####End####
###Start###
2963 43 % 3 EmptySlate::main @ 272 (1049 bytes)
2972 42 3 java.nio.CharBuffer::arrayOffset (35 bytes)
2972 44 % 4 EmptySlate::main @ 272 (1049 bytes)
2975 43 % 3 EmptySlate::main @ -2 (1049 bytes) made not entrant
2975 44 % 4 EmptySlate::main @ -2 (1049 bytes) made not entrant
Time: 11731769
####End####
###Start###
2975 45 % 3 EmptySlate::main @ 361 (1049 bytes)
2983 46 % 4 EmptySlate::main @ 361 (1049 bytes)
2985 45 % 3 EmptySlate::main @ -2 (1049 bytes) made not entrant
2985 46 % 4 EmptySlate::main @ -2 (1049 bytes) made not entrant
Time: 10467160
####End####
###Start###
2985 47 % 3 EmptySlate::main @ 450 (1049 bytes)
2994 48 % 4 EmptySlate::main @ 450 (1049 bytes)
2996 47 % 3 EmptySlate::main @ -2 (1049 bytes) made not entrant
2996 48 % 4 EmptySlate::main @ -2 (1049 bytes) made not entrant
Time: 10436370
####End####
###Start###
2996 49 % 3 EmptySlate::main @ 539 (1049 bytes)
3005 50 % 4 EmptySlate::main @ 539 (1049 bytes)
3007 49 % 3 EmptySlate::main @ -2 (1049 bytes) made not entrant
3007 50 % 4 EmptySlate::main @ -2 (1049 bytes) made not entrant
Time: 10839760
3007 51 ####End####
3
java.lang.String::getChars (62 bytes)
###Start###
3007 52 % 3 EmptySlate::main @ 628 (1049 bytes)
3015 53 % 4 EmptySlate::main @ 628 (1049 bytes)
3017 52 % 3 EmptySlate::main @ -2 (1049 bytes) made not entrant
3017 53 % 4 EmptySlate::main @ -2 (1049 bytes) made not entrant
Time: 10452435
####End####
###Start###
3018 54 % 3 EmptySlate::main @ 717 (1049 bytes)
3026 55 % 4 EmptySlate::main @ 717 (1049 bytes)
3028 54 % 3 EmptySlate::main @ -2 (1049 bytes) made not entrant
3028 55 % 4 EmptySlate::main @ -2 (1049 bytes) made not entrant
Time: 10427892
####End####
###Start###
3028 56 % 3 EmptySlate::main @ 806 (1049 bytes)
3037 57 % 4 EmptySlate::main @ 806 (1049 bytes)
3039 56 % 3 EmptySlate::main @ -2 (1049 bytes) made not entrant
3039 57 % 4 EmptySlate::main @ -2 (1049 bytes) made not entrant
Time: 10888399 3039 58 3 java.util.Arrays::copyOfRange (63 bytes)
####End####
###Start###
3040 60 % 3 EmptySlate::main @ 895 (1049 bytes)
3048 59 3 java.nio.charset.CoderResult::isUnderflow (13 bytes)
3048 61 % 4 EmptySlate::main @ 895 (1049 bytes)
3050 60 % 3 EmptySlate::main @ -2 (1049 bytes) made not entrant
3050 61 % 4 EmptySlate::main @ -2 (1049 bytes) made not entrant
Time: 11396206
####End####
###Start###
3051 63 % 3 EmptySlate::main @ 984 (1049 bytes)
3059 62 3 java.nio.ByteBuffer::array (35 bytes)
3059 64 % 4 EmptySlate::main @ 984 (1049 bytes)
3061 63 % 3 EmptySlate::main @ -2 (1049 bytes) made not entrant
3061 64 % 4 EmptySlate::main @ -2 (1049 bytes) made not entrant
Time: 10975860
####End####
In the loop, regardless of the total number of cycles, the first two cycles are significantly slower and then goes down to almost nothing starting from cycle 3. Whereas, the explicit code remains mostly stable at the same value.
I've tried it in reverse order (explicit repetition first), with a different number of cycles, and I get the same result.
I thought maybe the compiler "only needs to prepare" once for the loop, not having to "initialize" at every iteration, but then why is the second cycle always slower, and only speeds up on cycle 3? I thought maybe it has to do with the CPU caching the array, but then the explicit code is also expected to be affected.
My question is Why do seemingly independent and equivalent segments of code run faster when repeated using a loop vs explicit repetition? Is the difference based at the compiler or hardware level?
CPU: Intel Core i7-3610QM @ 2.3GHz
Java version: 1.8.0_45
Java(TM) SE Runtime Environment (build 1.8.0_45-b14)
Java HotSpot(TM) 64-Bit Server VM (build 25.45-b02, mixed mode)