13

I wrote a bit of code to make a multidimensional array rather than an array of arrays so that I could save some memory. Then I ran some tests to compare it's speed to that of the regular Java array of arrays (int[][]) as I don't want my program running slower even if it saves some memory. What I saw in the timing tests has me confused. Here are typical results for a test run. The times are for the same bit of code. Notice how the last two are much larger than the first four.

time: 58343722 ns
time: 59451156 ns
time: 51374777 ns

time: 61777424 ns
time: 813156695 ns
time: 782140511 ns

Now the first thing I thought was that the garbage collector what kicking in. I raised the memory limit to 5GB (-Xmx5g) so that the garbage collector defiantly wouldn't start. Nothing changed. I moved things around, but the pattern stays.

So what is the pattern? In the first three times, the bit of code is in a function and I call it three times. In the second three times the bit of code is repeated in a single function three times. So that the pattern is that whenever the bit of code is run multiple times in the same function the time it takes to run the bit of code will shoot up starting with the second bit of code and stay up there after.

I did find one alteration that will produce results like this:

time: 58729424 ns
time: 59965426 ns
time: 51441618 ns

time: 57359741 ns
time: 65362705 ns
time: 857942387 ns

What I did was add a one millisecond delay between the code bits of the second three. Doing that only speeds up the second of the code bits in the block and no amount of delay will speed up any of code bits there after.

Frankly, I'm confused. I can't explain this behavior. Can someone shed some light on what is going on?

Here is the code:

package multidimensionalarraytests;

import java.lang.reflect.Array;
import java.util.logging.Level;
import java.util.logging.Logger;

public class MultidimensionalArrayTests {
    static ArrayInt2Dv1 array=new ArrayInt2Dv1(10000,10000);

    public static void main(String[] args) {
        System.out.println("ignore the warmup");
        test();
        test();
        combined();
        combined();
        
        System.out.println("running tests");
        test();
        test();
        test();
        System.out.println();
        combined();
    }

    static long test(){
        int value=1;
        long start,stop,time;
        
        System.out.print("time: ");
        start=System.nanoTime();
        for(int x=0;x<array.length1;x++){
            for(int y=0;y<array.length2;y++){
                array.set(x, y, value);
                value=array.get(x, y);
            }   
        }
        stop=System.nanoTime();
        time=(stop-start);
        System.out.println(time+" ns");
        return time;
    }

    static void combined(){
        int value=1;
        long start,stop,time;
        
        System.out.print("time: ");
        start=System.nanoTime();
        for(int x=0;x<array.length1;x++){
            for(int y=0;y<array.length2;y++){
                array.set(x, y, value);
                value=array.get(x, y);
            }   
        }
        stop=System.nanoTime();
        time=(stop-start);
        System.out.println(time+" ns");

        //try {Thread.sleep(1);} catch (InterruptedException ex) {}

        System.out.print("time: ");
        start=System.nanoTime();
        for(int x=0;x<array.length1;x++){
            for(int y=0;y<array.length2;y++){
                array.set(x, y, value);
                value=array.get(x, y);
            }   
        }
        stop=System.nanoTime();
        time=(stop-start);
        System.out.println(time+" ns");

        //try {Thread.sleep(60000);} catch (InterruptedException ex) {}

        System.out.print("time: ");
        start=System.nanoTime();
        for(int x=0;x<array.length1;x++){
            for(int y=0;y<array.length2;y++){
                array.set(x, y, value);
                value=array.get(x, y);
            }   
        }
        stop=System.nanoTime();
        time=(stop-start);
        System.out.println(time+" ns");     
    }
}

and:

package multidimensionalarraytests;

public class ArrayInt2Dv1 {
    int [] array;

    public final int length1;
    public final int length2;

    public ArrayInt2Dv1(int length1, int length2){
        this.length1=length1;
        this.length2=length2;
        array=new int[length1*length2];
    }

    public int get(int x,int y){
        return array[x*length2+y];
    }

    public void set(int x,int y,int value){
        array[x*length2+y]=value;
    }
}

---Edit---

The output on Windows 7 with options -Xms5g -Xmx5g -XX:+PrintCompilation -verbose:gc -XX:CICompilerCount=1 -Xbatch

time:     299    1    b        multidimensionalarraytests.ArrayInt2Dv1::set (15 bytes)
    302    2    b        multidimensionalarraytests.ArrayInt2Dv1::get (14 bytes)
    303    1 %  b        multidimensionalarraytests.MultidimensionalArrayTests::test @ 31 (114 bytes)
    358    1 %           multidimensionalarraytests.MultidimensionalArrayTests::test @ -2 (114 bytes)   made not entrant
60671451 ns
    359    3    b        multidimensionalarraytests.MultidimensionalArrayTests::test (114 bytes)
time:     365    2 %  b        multidimensionalarraytests.MultidimensionalArrayTests::test @ 31 (114 bytes)
58104484 ns
time:     425    3 %  b        multidimensionalarraytests.MultidimensionalArrayTests::combined @ 31 (330 bytes)
69008251 ns
time: 806898159 ns
time: 845447596 ns
   2146    4    b        multidimensionalarraytests.MultidimensionalArrayTests::combined (330 bytes)
time: 52493169 ns
time: 804304528 ns
time: 845500191 ns
running tests
time: 51290771 ns
time: 51922285 ns
time: 51264108 ns

time: 52258679 ns
time: 842229025 ns
time: 871403625 ns

On Linux (Ubuntu on VirtualBox on the same machine) with the same options:

    283   1   b   java.lang.String::hashCode (60 bytes)
    285   2   b   sun.nio.cs.UTF_8$Encoder::encodeArrayLoop (490 bytes)
    287   3   b   java.lang.String::charAt (33 bytes)
    287   4   b   java.lang.String::indexOf (151 bytes)
    297   5   b   java.io.UnixFileSystem::normalize (75 bytes)
   2850   6   b   java.lang.String::lastIndexOf (156 bytes)
ignore the warmup
time:    5885   7   b   multidimensionalarraytests.ArrayInt2Dv1::set (15 bytes)
   5948   8   b   multidimensionalarraytests.ArrayInt2Dv1::get (14 bytes)
   5949   1%  b   multidimensionalarraytests.MultidimensionalArrayTests::test @ 31 (114 bytes)
11529998483 ns
  17565   9   b   multidimensionalarraytests.MultidimensionalArrayTests::test (114 bytes)
time: 1619622928 ns
time:   19718   2%  b   multidimensionalarraytests.MultidimensionalArrayTests::combined @ 31 (330 bytes)
475786382 ns
time: 288586857 ns
time: 315560700 ns
  20789  10   b   multidimensionalarraytests.MultidimensionalArrayTests::combined (330 bytes)
time: 460577230 ns
time: 311525066 ns
time: 312343429 ns
running tests
time: 310261854 ns
time: 298826592 ns
time: 304689920 ns

time: 315416579 ns
time: 299473245 ns
time: 290211350 ns
Community
  • 1
  • 1
user1145922
  • 339
  • 3
  • 13
  • 4
    For starters, take a look at http://stackoverflow.com/questions/504103/how-do-i-write-a-correct-micro-benchmark-in-java – NPE Jan 12 '12 at 17:30
  • Good advice. I changed the code above to reflect the recommendations and ran the program with the options: -Xms5g -Xmx5g -XX:+PrintCompilation -verbose:gc -XX:CICompilerCount=1 -Xbatch. It did't significantly change the outcome though. – user1145922 Jan 12 '12 at 19:30
  • 1
    I've tried (with the updated code), and I am not able to reproduce the effect using Java 6u26 on 64-bit Ubuntu. Here, all six elapsed times are well within 10% of each other (the warm-up times do fluctuate due to the just-in-time compiler kicking in). – NPE Jan 12 '12 at 20:11
  • cannot reproduce on a MacOS 10.6.8 with Java 1.7.0-ea-b223 (64Bit server vm 21.0-b17). As a followup to this/aix's comment could you provide your setup (OS plus java version) and did you run the programme with any other options? – Mateusz Dymczyk Jan 12 '12 at 20:51
  • I'm on Windows 7 and used Java 1.6.21 and 1.7.0 with the same result on both. I only used the options I listed above. – user1145922 Jan 12 '12 at 21:15
  • Also, could you edit the question to include the complete output produced by the program when run with `-Xms5g -Xmx5g -XX:+PrintCompilation -verbose:gc -XX:CICompilerCount=1 -Xbatch`. – NPE Jan 12 '12 at 21:19
  • I tested it on Linux (Ubuntu on VirtualBox on the same machine) and did not get the odd behavior. – user1145922 Jan 13 '12 at 19:15

1 Answers1

6

Try -XX:+PrintCompilation This should show that the whole method is optimised after the first loop iterates 10000 times. The problem is that the second/thirds loop is optimised with no statistical/counter information. Sometimes this doesn't matter, sometimes the later loops are much slower and if you swap the order of the loops, the later loops improve and the first loop gets slower.

The simple way to fix this is to place each loop in its own method and each loop will be optimised properly.

Peter Lawrey
  • 525,659
  • 79
  • 751
  • 1,130
  • OK, I see what your talking about. In the combined() function a compile operation occurs during the frist set of loops, but not the others. If I add the one ms sleeps in then compile operations occur during the first two sets of loops, but not the third. The loops where the compile operations occur correspond to the faster executions. It seems rather bizarre to me that it would only optimize the first set of loops and not the others. Well I least I know what it's doing though not why. – user1145922 Jan 13 '12 at 00:30
  • One odd thing though is that the same optimizations are done when I use an int[][] rather than my ArrayInt2Dv1, but there is no time penalty for not running the optimization on the last two loops. It's rather annoying as I won't be able to simply replace Java arrays of arrays with my multidimensional arrays without keeping the workings of the optimizer in mind or things could get rather slow. – user1145922 Jan 13 '12 at 00:31
  • All the loops are optimised, however the later loops are optimised based on no dynamically collected information (as the method is compiled at once base don the information it has at that point) – Peter Lawrey Jan 13 '12 at 08:29