Some regression when using initial capacity for ArrayList on the first iterations

Question

I'm a bit confused. On the first iterations of fill loops I see some regression in filling time when using initial capacity for ArrayList vs without using initial capacity.

According to the common sense and this question: Why start an ArrayList with an initial capacity?

it must be absolutely inversely.

It is not well-written benchmark test, and I wonder: why on the first iteration it always consumes much more time and CPU when using initial capacity for ArrayList?

This is the test:

public class TestListGen {
    public static final int TEST = 100_000_000;  

    public static void main(String[] args) {
        test(false);
    }    

    private static void test(boolean withInitCapacity) {
        System.out.println("Init with capacity? " + withInitCapacity);
        for (int i = 0; i < 5; i++)
            av += fillAndTest(TEST, withInitCapacity ? new ArrayList<Integer>(TEST) : new ArrayList<Integer>());

        System.out.println("Average: " + (av / 5));
    }    

    private static long fillAndTest(int capacity, List<Integer> list) {
        long time1 = System.nanoTime();
        for (int i = 0; i < capacity; i++) list.add(i);
        long delta = System.nanoTime() - time1;
        System.out.println(delta);
        return delta;
    }
}

Output: 1)

Init with capacity? false
17571882469
12179868327
18460127904
5894883202
13223941250
Average: 13466140630

2)

Init with capacity? true
37271627087
16341545990
19973801769
4888093008
2442179779
Average: 16183449526

I've tested it on: JDK 1.7.0.40, JDK 1.8.0.31

What happens if you run the tests in the opposite order (that is, first with initial capacity, then without)? — Scott Hunter, Mar 25 '15 at 14:45
GC randomly kicking in. You need to run it for more iterations than 5 to have sensible measurement.... — Zielu, Mar 25 '15 at 14:45
Benchmarking Java code is hard. Please take a look at http://stackoverflow.com/questions/504103/how-do-i-write-a-correct-micro-benchmark-in-java — NPE, Mar 25 '15 at 14:46
@ScottHunter I've run in separate `jvm` starts, i.e. not in one run. — Andremoniy, Mar 25 '15 at 14:46
@Zielu it always consumes much more time and CPU on the first iteration of array. Test it by your self and you will see. (I mean 1st iteration with initial capacity vs witout it) — Andremoniy, Mar 25 '15 at 14:47
This is what happens when you don't execute a well micro benchmark... — Luiggi Mendoza, Mar 25 '15 at 14:53
NPE already did... there's no need to repost the same link twice. And you will be better using a micro benchmark framework like JMH or Caliper. — Luiggi Mendoza, Mar 25 '15 at 14:56
Your measured timings vary by an order of magnitude in the "with" case, with both of the two fastest measurements beating the fastest "without" measurement. This does not present a clear picture. — John Bollinger, Mar 25 '15 at 14:58
@Andremoniy, I suggest you reformulate your question as, "Why am I seeing a 16% regression when using initial capacity" instead of "Hey, using initial capacity is slower, why?". I think it's an interesting question (I can reproduce and my machine doesn't seem as fast as yours, but I'm still seeing a 16% difference.) I bet that when someone that *actually* knows what's going on steps forward and gives a decent answer to your question, we will all learn something here, (presumably related to micro benchmarking rather than ArrayList issues). — aioobe, Mar 25 '15 at 14:58
@Andremoniy. I just run it with 100 iteration instead of 5: With init false Average: 423499262, With init true Average: 177595371, so with initial capacity it is 2 times faster. You cannot just use few iterations, especially if inside those 5 you have such a range of values. — Zielu, Mar 25 '15 at 15:00
@Zielu I have reformulated my question. I wonder why **first** iterations give me much more time and CPU usage when using initial capacity. — Andremoniy, Mar 25 '15 at 15:19
What's heap size and machine memory size? Your list is pretty big - occupying > 1 GB — Random42, Mar 25 '15 at 15:22
@m3th0dman `32GB` on board, aprox. `5GB` app consumes after 1st iteration — Andremoniy, Mar 25 '15 at 15:25

score 2 · Accepted Answer · answered Mar 25 '15 at 15:46

It's a Java heap allocation artifact that is causing the results that you don't expect. Adjust the initial heap allocation and you will see more consistent results by removing heap allocation times from the mix. Also, you need to be sure that the process running the benchmark is not being swapped. On my system I got an OOM error when TEST = 100_000_000 and had to reduce it to 10_000_000 for my tests. Also I ran both test(false) and test(true) one after the other. Note how allocating the heap at startup and adding explicit gc in the results below make the individual times much more consistent. Adding back a warmup would also be important to make the test more consistent, but I didn't bother with that.

Original Test

Init with capacity? false
1714537208
1259523722
1215986030
1098740959
1029914697
Average: 1263740523
Init with capacity? true
343100302
612709138
355210333
603609642
348401796
Average: 452606242

Test with -Xms500m -Xmx500m

Init with capacity? false
682827716
738137558
576581143
662777089
555706338
Average: 643205968
Init with capacity? true
368245589
312674836
297705054
392935762
307209139
Average: 335754076

Test with -Xms500m -Xmx500m + System.gc() before fillAndTest()

Init with capacity? false
502767979
435508363
420956590
487184801
416041923
Average: 452491931
Init with capacity? true
300744404
298446734
299080656
300084036
298473576
Average: 299365881

@Andremoniy Sure thing. Isn't it interesting how the more steps you take to eliminate heap allocations/gc from the mix the closer the performance gets between constructing with and without init capacity? It's a testimony to the magic of growing array capacity dynamically by 1.5. — Kevin Condon, Mar 25 '15 at 16:04

Some regression when using initial capacity for ArrayList on the first iterations

1 Answers1