0

I had the below code. I just wanted to check the running time of a code block. And mistakenly i had copied and pasted the same code again and get an interesting result. Though the code block is the same the running times are different. And the code block 1 taking more time than the others. If i switch the code blocks (say i move the code blocks 4 to the top) then code block 4 will be taking more time than others.

I used two different types of Arrays in my code blocks to check it depends on that. And the result is same. If the code blocks has the same type of arrays then the top most code block is taking more time. See the below code and the given out put.

public class ABBYtest {

public static void main(String[] args) {
    long startTime;
    long endTime;

    //code block 1
    startTime = System.nanoTime();
    Long a[] = new Long[10];
    for (int i = 0; i < a.length; i++) {
        a[i] = 12l;
    }
    Arrays.sort(a);
    endTime = System.nanoTime();
    System.out.println("code block (has Long array) 1 = " + (endTime - startTime));

    //code block 6
    startTime = System.nanoTime();
    Long aa[] = new Long[10];
    for (int i = 0; i < aa.length; i++) {
        aa[i] = 12l;
    }
    Arrays.sort(aa);
    endTime = System.nanoTime();
    System.out.println("code block (has Long array) 6 = " + (endTime - startTime));


    //code block 7
    startTime = System.nanoTime();
    Long aaa[] = new Long[10];
    for (int i = 0; i < aaa.length; i++) {
        aaa[i] = 12l;
    }
    Arrays.sort(aaa);
    endTime = System.nanoTime();
    System.out.println("code block (has Long array) 7 = " + (endTime - startTime));

    //code block 2
    startTime = System.nanoTime();
    long c[] = new long[10];
    for (int i = 0; i < c.length; i++) {
        c[i] = 12l;
    }
    Arrays.sort(c);
    endTime = System.nanoTime();
    System.out.println("code block (has long array) 2 = " + (endTime - startTime));

    //code block 3
    startTime = System.nanoTime();
    long d[] = new long[10];
    for (int i = 0; i < d.length; i++) {
        d[i] = 12l;
    }
    Arrays.sort(d);
    endTime = System.nanoTime();
    System.out.println("code block (has long array) 3 = " + (endTime - startTime));

    //code block 4
    startTime = System.nanoTime();
    long b[] = new long[10];
    for (int i = 0; i < b.length; i++) {
        b[i] = 12l;
    }
    Arrays.sort(b);
    endTime = System.nanoTime();
    System.out.println("code block (has long array) 4 = " + (endTime - startTime));

    //code block 5
    startTime = System.nanoTime();
    Long e[] = new Long[10];
    for (int i = 0; i < e.length; i++) {
        e[i] = 12l;
    }
    Arrays.sort(e);
    endTime = System.nanoTime();
    System.out.println("code block (has Long array) 5 = " + (endTime - startTime));


}
}

The running times:

code block (has Long array) 1 = 802565

code block (has Long array) 6 = 6158

code block (has Long array) 7 = 4619

code block (has long array) 2 = 171906

code block (has long array) 3 = 4105

code block (has long array) 4 = 3079

code block (has Long array) 5 = 8210

As we can see the first code block which contains the Long array will take more time than others which contain Long arrays. And it is the same for the first code block which contains long array.

Can anyone explain this behavior. or Am i doing some mistake here ??

prime
  • 14,464
  • 14
  • 99
  • 131
  • 1
    I would expect this to be some VM overhead; something "warming up," perhaps the garbage collector doing something in the background. – asthasr Dec 28 '13 at 04:45
  • 1
    Please have a look at [this answer](http://stackoverflow.com/a/513259/327038). – asthasr Dec 28 '13 at 04:46

4 Answers4

23

Faulty benchmarking. The non exhaustive list of what is wrong:

  • No warmup: single shot measurements are almost always wrong;
  • Mixing several codepaths in the single method: we probably start compiling the method with the execution data available only for the first loop in the method;
  • Sources are predictable: should the loop compile, we can actually predict the result;
  • Results are dead-code eliminated: should the loop compile, we can throw the loop it away

Here is how you do it arguably right with jmh:

@OutputTimeUnit(TimeUnit.NANOSECONDS)
@BenchmarkMode(Mode.AverageTime)
@Warmup(iterations = 3, time = 1)
@Measurement(iterations = 3, time = 1)
@Fork(10)
@State(Scope.Thread)
public class Longs {

    public static final int COUNT = 10;

    private Long[] refLongs;
    private long[] primLongs;

    /*
     * Implementation notes:
     *   - copying the array from the field keeps the constant
     *     optimizations away, but we implicitly counting the
     *     costs of arraycopy() in;
     *   - two additional baseline experiments quantify the
     *     scale of arraycopy effects (note you can't directly
     *     subtract the baseline scores from the tests, because
     *     the code is mixed together;
     *   - the resulting arrays are always fed back into JMH
     *     to prevent dead-code elimination
     */

    @Setup
    public void setup() {
        primLongs = new long[COUNT];
        for (int i = 0; i < COUNT; i++) {
            primLongs[i] = 12l;
        }

        refLongs = new Long[COUNT];
        for (int i = 0; i < COUNT; i++) {
            refLongs[i] = 12l;
        }
    }

    @GenerateMicroBenchmark
    public long[] prim_baseline() {
        long[] d = new long[COUNT];
        System.arraycopy(primLongs, 0, d, 0, COUNT);
        return d;
    }

    @GenerateMicroBenchmark
    public long[] prim_sort() {
        long[] d = new long[COUNT];
        System.arraycopy(primLongs, 0, d, 0, COUNT);
        Arrays.sort(d);
        return d;
    }

    @GenerateMicroBenchmark
    public Long[] ref_baseline() {
        Long[] d = new Long[COUNT];
        System.arraycopy(refLongs, 0, d, 0, COUNT);
        return d;
    }

    @GenerateMicroBenchmark
    public Long[] ref_sort() {
        Long[] d = new Long[COUNT];
        System.arraycopy(refLongs, 0, d, 0, COUNT);
        Arrays.sort(d);
        return d;
    }

}

...this yields:

Benchmark                   Mode   Samples         Mean   Mean error    Units
o.s.Longs.prim_baseline     avgt        30       19.604        0.327    ns/op
o.s.Longs.prim_sort         avgt        30       51.217        1.873    ns/op
o.s.Longs.ref_baseline      avgt        30       16.935        0.087    ns/op
o.s.Longs.ref_sort          avgt        30       25.199        0.430    ns/op

At this point you may start to wonder why sorting Long[] and sorting long[] takes different time. The answer lies in the Array.sort() overloads: OpenJDK sorts primitive and reference arrays via different algos (references with TimSort, primitives with dual-pivot quicksort). Here's the highlight of choosing another algo with -Djava.util.Arrays.useLegacyMergeSort=true, which falls back to merge sort for references:

Benchmark                   Mode   Samples         Mean   Mean error    Units
o.s.Longs.prim_baseline     avgt        30       19.675        0.291    ns/op
o.s.Longs.prim_sort         avgt        30       50.882        1.550    ns/op
o.s.Longs.ref_baseline      avgt        30       16.742        0.089    ns/op
o.s.Longs.ref_sort          avgt        30       64.207        1.047    ns/op

Hope that helps to explain the difference.

The explanation above barely scratch the surface about the performance of sorting. The performance is very different when presented with different source data (including available pre-sorted subsequences, their patterns and run lengths, sizes of the data itself).

Aleksey Shipilev
  • 18,599
  • 2
  • 67
  • 86
5

Can anyone explain this behavior. or Am i doing some mistake here ??

Your problem is a badly written benchmark. You do not take account of JVM warmup effects. Things like the overheads of loading code, initial expansion of the heap, and JIT compilation. In addition, startup of an application always generates extra garbage that needs to be collected.

In addition, if your application itself generates garbage (and I expect that sort and / or println are doing that) then you need to take account of possible GC runs during the "steady state" phase of your benchmark application's run.

See this Q&A for hints on how to write valid Java benchmarks:

There are numerous other articles on this. Google for "how to write a java benchmark".


In this example, I suspect that the first code block takes so much longer than the rest because of (initially) bytecode interpretation followed by the overhead of JIT compilation. You may well be garbage collecting to deal with temporary objects created during loading and JIT compilation. The high value for the 4th measurement is most likely due to another garbage collection cycle.

However, one would need to turn on some JVM logging to figure out the real cause.

Community
  • 1
  • 1
Stephen C
  • 698,415
  • 94
  • 811
  • 1,216
  • why is it always happening the same way ?? I mean if i have only the code blocks which has `Long arrays` the top most code block will take more time. – prime Dec 28 '13 at 04:55
  • 1
    Because of the things I mentioned in my Answer. – Stephen C Dec 28 '13 at 04:57
1

Just to add to what everyone else is saying. Java will not necessarily compile everything. When it analyses the code for optimization, java will choose to interpret code that is not used extensively a fair amount of the time. If you look at the byte codes your Long arrays should always take more time and certainly space complexity than your long arrays, but as has been pointed out, warmup effects will have an effect.

ed209
  • 828
  • 2
  • 14
  • 30
1

This is could be due to a few things:

  • As noted by syrion, Java's virtual machine is allowed to perform optimizations on your code as it is running. Your first block is likely taking longer because Java hasn't yet optimized your code fully. As the first block runs, the JVM is applying changes which can then be utilized in the other blocks.
  • Your processor could be caching the results of your code, speeding up future blocks. This is similar to the previous point, but can vary even between identical JVM implementations.
  • While your program is running, your computer is also performing other tasks. These include handling the OS's UI, checking for program updates, etc. For this reason, some blocks of code can be slower than others, because your computer isn't concentrating as much resources towards its execution.
  • Java's virtual machine is garbage collected. That is to say, at unspecified points during your program's execution, the JVM takes some time to clean up any objects that are no longer used.

Points 1 and 2 are likely the cause for the large difference in the first block's execution time. Point 3 could be the reason for the smaller fluctuations, and point 4, as noted by Stephen, probably caused the large stall in block 3.

Another thing that I didn't notice is your use of both long and Long. The object form contains a larger memory overhead, and both are subject to different optimizations.

Kyle
  • 374
  • 1
  • 11