Why do repeated segments of code run faster inside loops vs explicit repetition in Java?

Question

I was having fun testing some code with Java, and as I was timing different segments of code, I came across an odd behavior. Repeating a simple segment of code inside a loop, for example 10 times, is significantly faster than literally copy-pasting the same segment 10 times:
Note that I followed the advice from this post about benchmarking in Java, and this code was run with the parameters -XX:+PrintCompilation and -verbose:gc.

//here max i = 3 (for shorter code), but I actually ran it with max i = 10
public static void main(String[] args) {
    final long[] list1 = new long[100000000];
    Random r = new Random();
    long time;

    for(int i = 0 ; i < list1.length ; i++) list1[i] = r.nextLong();

    System.out.println("\n\nInitialization Done\n\n");

    for(int i = 0 ; i < 3 ; i++) {
        System.out.println("###Start###");
        time = System.nanoTime();
        for(long l : list1) {long n = l*65356;}
        System.out.println("Time:\t" + (System.nanoTime()-time) + "\t");
        System.out.println("####End####\n");
    }

    System.out.println("\n\nLoop Done\n\n");

    System.out.println("###Start###");
    time = System.nanoTime();
    for(long l : list1) {long n = l*65356;}
    System.out.println("Time:\t" + (System.nanoTime()-time) + "\t");
    System.out.println("####End####\n");

    System.out.println("###Start###");
    time = System.nanoTime();
    for(long l : list1) {long n = l*65356;}
    System.out.println("Time:\t" + (System.nanoTime()-time) + "\t");
    System.out.println("####End####\n");

    System.out.println("###Start###");
    time = System.nanoTime();
    for(long l : list1) {long n = l*65356;}
    System.out.println("Time:\t" + (System.nanoTime()-time) + "\t");
    System.out.println("####End####\n");
}

And here are the surprising results plotted in a graph:

Console Output:

//I removed a whole bunch of initialization before this point
Initialization Done


###Start###
   2926   32 %     3       EmptySlate::main @ 76 (1049 bytes)
   2933   33 %     4       EmptySlate::main @ 76 (1049 bytes)
   2935   32 %     3       EmptySlate::main @ -2 (1049 bytes)   made not entrant
   2935   33 %     4       EmptySlate::main @ -2 (1049 bytes)   made not entrant
Time:   9570242 
####End####

###Start###
   2936   34 %     3       EmptySlate::main @ 76 (1049 bytes)
   2945   35 %     4       EmptySlate::main @ 76 (1049 bytes)
   2951   34 %     3       EmptySlate::main @ -2 (1049 bytes)   made not entrant
Time:   15950404    
####End####

###Start###
Time:   893 
####End####
   2952 
  36       1       java.nio.Buffer::position###Start### (5 bytes)

Time:   446 
####End####

###Start###   2952   37       1       java.nio.Buffer::limit
 (5 bytes)
Time:   892 
####End####

###Start###
Time:   447 
####End####

###Start###
Time:   446 
####End####

###Start###
Time:   446 
####End####

###Start###
Time:   447 
####End####

###Start###   2952   38       3       java.nio.ByteBuffer::arrayOffset (35 bytes)

Time:   1785    
####End####

Loop Done

###Start###
   2952   35 %     4       EmptySlate::main @ -2 (1049 bytes)   made not entrant
   2952   39       3       java.nio.Buffer::position (43 bytes)
   2952   40 %     3       EmptySlate::main @ 183 (1049 bytes)
   2961   41 %     4       EmptySlate::main @ 183 (1049 bytes)
   2963   40 %     3       EmptySlate::main @ -2 (1049 bytes)   made not entrant
   2963   41 %     4       EmptySlate::main @ -2 (1049 bytes)   made not entrant
Time:   10607276    
####End####

###Start###
   2963   43 %     3       EmptySlate::main @ 272 (1049 bytes)
   2972   42       3       java.nio.CharBuffer::arrayOffset (35 bytes)
   2972   44 %     4       EmptySlate::main @ 272 (1049 bytes)
   2975   43 %     3       EmptySlate::main @ -2 (1049 bytes)   made not entrant
   2975   44 %     4       EmptySlate::main @ -2 (1049 bytes)   made not entrant
Time:   11731769    
####End####

###Start###
   2975   45 %     3       EmptySlate::main @ 361 (1049 bytes)
   2983   46 %     4       EmptySlate::main @ 361 (1049 bytes)
   2985   45 %     3       EmptySlate::main @ -2 (1049 bytes)   made not entrant
   2985   46 %     4       EmptySlate::main @ -2 (1049 bytes)   made not entrant
Time:   10467160    
####End####

###Start###
   2985   47 %     3       EmptySlate::main @ 450 (1049 bytes)
   2994   48 %     4       EmptySlate::main @ 450 (1049 bytes)
   2996   47 %     3       EmptySlate::main @ -2 (1049 bytes)   made not entrant
   2996   48 %     4       EmptySlate::main @ -2 (1049 bytes)   made not entrant
Time:   10436370    
####End####

###Start###
   2996   49 %     3       EmptySlate::main @ 539 (1049 bytes)
   3005   50 %     4       EmptySlate::main @ 539 (1049 bytes)
   3007   49 %     3       EmptySlate::main @ -2 (1049 bytes)   made not entrant
   3007   50 %     4       EmptySlate::main @ -2 (1049 bytes)   made not entrant
Time:   10839760    
   3007   51 ####End####
      3      
 java.lang.String::getChars (62 bytes)
###Start###
   3007   52 %     3       EmptySlate::main @ 628 (1049 bytes)
   3015   53 %     4       EmptySlate::main @ 628 (1049 bytes)
   3017   52 %     3       EmptySlate::main @ -2 (1049 bytes)   made not entrant
   3017   53 %     4       EmptySlate::main @ -2 (1049 bytes)   made not entrant
Time:   10452435    
####End####

###Start###
   3018   54 %     3       EmptySlate::main @ 717 (1049 bytes)
   3026   55 %     4       EmptySlate::main @ 717 (1049 bytes)
   3028   54 %     3       EmptySlate::main @ -2 (1049 bytes)   made not entrant
   3028   55 %     4       EmptySlate::main @ -2 (1049 bytes)   made not entrant
Time:   10427892    
####End####

###Start###
   3028   56 %     3       EmptySlate::main @ 806 (1049 bytes)
   3037   57 %     4       EmptySlate::main @ 806 (1049 bytes)
   3039   56 %     3       EmptySlate::main @ -2 (1049 bytes)   made not entrant
   3039   57 %     4       EmptySlate::main @ -2 (1049 bytes)   made not entrant
Time:   10888399       3039   58       3       java.util.Arrays::copyOfRange (63 bytes)

####End####

###Start###
   3040   60 %     3       EmptySlate::main @ 895 (1049 bytes)
   3048   59       3       java.nio.charset.CoderResult::isUnderflow (13 bytes)
   3048   61 %     4       EmptySlate::main @ 895 (1049 bytes)
   3050   60 %     3       EmptySlate::main @ -2 (1049 bytes)   made not entrant
   3050   61 %     4       EmptySlate::main @ -2 (1049 bytes)   made not entrant
Time:   11396206    
####End####

###Start###
   3051   63 %     3       EmptySlate::main @ 984 (1049 bytes)
   3059   62       3       java.nio.ByteBuffer::array (35 bytes)
   3059   64 %     4       EmptySlate::main @ 984 (1049 bytes)
   3061   63 %     3       EmptySlate::main @ -2 (1049 bytes)   made not entrant
   3061   64 %     4       EmptySlate::main @ -2 (1049 bytes)   made not entrant
Time:   10975860    
####End####

In the loop, regardless of the total number of cycles, the first two cycles are significantly slower and then goes down to almost nothing starting from cycle 3. Whereas, the explicit code remains mostly stable at the same value.

I've tried it in reverse order (explicit repetition first), with a different number of cycles, and I get the same result.

I thought maybe the compiler "only needs to prepare" once for the loop, not having to "initialize" at every iteration, but then why is the second cycle always slower, and only speeds up on cycle 3? I thought maybe it has to do with the CPU caching the array, but then the explicit code is also expected to be affected.

My question is Why do seemingly independent and equivalent segments of code run faster when repeated using a loop vs explicit repetition? Is the difference based at the compiler or hardware level?

CPU: Intel Core i7-3610QM @ 2.3GHz

Java version: 1.8.0_45
Java(TM) SE Runtime Environment (build 1.8.0_45-b14)
Java HotSpot(TM) 64-Bit Server VM (build 25.45-b02, mixed mode)

Because your benchmark is flawed - please use a proper framework like JMH to get meaningful results. — Clashsoft, Apr 21 '16 at 21:42
@Clashsoft they may be flawed, I'm not interested in the absolute times. But still, why do I see these results? — Maljam, Apr 21 '16 at 21:44
When the JVM notices that code has no observable result, it might skip it the next time. Like computing the product of your list. — erickson, Apr 21 '16 at 21:54

Why do repeated segments of code run faster inside loops vs explicit repetition in Java?

0 Answers0