Why is this slower when giving ArrayList an initial capacity?

Question

For an experiment, I made this little program. It just generates 10 million random strings and adds them to an arraylist. Notice that the arraylist does not have an initial capacity.

// editors note: added the necessary boilerplate to run,
// and take initial capacity as an optional cmdline arg for easier testing
import java.util.ArrayList;
import java.util.List;
import java.util.Random;

class ArrayListTest {
    public static void main(String[] args)
    {
        int initsize = -1;
        if (args.length > 0) {
            initsize = Integer.parseInt(args[0]);
        }

        long startTime = System.currentTimeMillis();

        List<String> randNums = initsize>=0 ? new ArrayList<>(initsize) : new ArrayList<>();
        // final List<String> randNums = initsize>=0 ? new ArrayList<String>(initsize) : new ArrayList<String>();

        Random r = new Random(1);

        for (int i = 0; i < 10_000_000; i++) {
            final int randomNum = r.nextInt();
            randNums.add(Integer.toString(randomNum));
        }

        System.out.println(System.currentTimeMillis() - startTime);
    }
}

I ran it 5 times, and the results in ms were:

I then altered the program to give the ArrayList an initial capacity:

    List<String> randNums = new ArrayList<>(10_000_000);

I again ran it 5 times, and these were the results:

It definitely consistently got slower. I assumed it would have been the opposite, since by declaring a big enough initial size, I took away all the overhead of the ArrayList increasing its own capacity. Why was it slower?

Further Info: Jre 1.8.051, 64-bit (On windows 10);

I don't see this behavior when I try it on my laptop, but this is likely to be platform-dependent. The only thing I can think of is that if you allocate space for the entire array all at once, it may end up swapping memory earlier, which could slow things down. — ajb, Aug 31 '15 at 01:28
which JVM? 64bit java defaults to the `server` vm, which JIT-compiles more aggressively than the default vm for 32bit java. So you sometimes see weird perf differences with 32bit java for things that a decent JIT-compiler would optimize away. — Peter Cordes, Aug 31 '15 at 01:40
What you're seeing is probably noise. Try adding "1" every time instead of doing the random generation. — Diego Basch, Aug 31 '15 at 01:44
@Diego Basch, I used a seed for the random generator, so that it generated the same numbers with every run. — Andrio, Aug 31 '15 at 01:46
That's not my point, it's about the added slowness. You're trying to measure the growth of the array and it may be masked by other stuff that's much slower. — Diego Basch, Aug 31 '15 at 01:48
Diego's point was that random number generation is slow, and might be dominating the runtime. That's clearly not the case, since your 5 samples for each test show there's pretty clearly a statistically significant effect here. Java's default RNG is very simple and fast. Adding a constant string would take Integer.toString() out of the picture, too, though. — Peter Cordes, Aug 31 '15 at 01:50
You are actually measuring Full GC pauses and heap resizing time, because JVM starts with a small initial heap size, but the actual usage grows during the test. One of the main rules of benchmarking is to measure the steady state of the application, but here you are benchmarking a transitional state which usually does not make much sense. — apangin, Aug 31 '15 at 02:02
Try running Java with `-Xmx2g -Xms2g` and the test will be an order of magnitude faster. Use `-XX:+PrintGCDetails` to make sure GC pauses does not affect your test timings. — apangin, Aug 31 '15 at 02:02
The main culprit in that kind of test is often the testing methodology (i.e. you're not measuring what you think you are measuring)... Using a benchmarking library such as jmh would probably yield different results. — assylias, Aug 31 '15 at 02:03

score 19 · Accepted Answer · answered Aug 31 '15 at 01:42

You'd think it was the other way around, but it has to do with Garbage Collection.

I didn't see the big difference you did (3900 vs 5100), but since this is GC related, you're probably running with lower memory. I ran with 64-bit and -Xmx4g.

Turning on the GC log (-Xloggc:path\to\file.log), I get this without size:

Java HotSpot(TM) 64-Bit Server VM (25.51-b03) for windows-amd64 JRE (1.8.0_51-b16), built on Jun  8 2015 18:03:07 by "java_re" with MS VC++ 10.0 (VS2010)
Memory: 4k page, physical 33478384k(25822824k free), swap 33476532k(26121788k free)
CommandLine flags: -XX:InitialHeapSize=535654144 -XX:MaxHeapSize=4294967296 -XX:+PrintGC -XX:+PrintGCTimeStamps -XX:+UseCompressedClassPointers -XX:+UseCompressedOops -XX:-UseLargePagesIndividualAllocation -XX:+UseParallelGC 
0.188: [GC (Allocation Failure)  131584K->114906K(502784K), 0.0795857 secs]
0.358: [GC (Allocation Failure)  246490K->229080K(634368K), 0.0794026 secs]
0.631: [GC (Allocation Failure)  492248K->452871K(716288K), 0.1389293 secs]
0.770: [Full GC (Ergonomics)     452871K->451407K(1188864K), 3.3224726 secs]
4.270: [GC (Allocation Failure)  714575K->714179K(1271808K), 0.2607092 secs]
4.531: [Full GC (Ergonomics)     714179K->814K(1050624K), 0.0070693 secs]

And I get this with the size:

Java HotSpot(TM) 64-Bit Server VM (25.51-b03) for windows-amd64 JRE (1.8.0_51-b16), built on Jun  8 2015 18:03:07 by "java_re" with MS VC++ 10.0 (VS2010)
Memory: 4k page, physical 33478384k(25818308k free), swap 33476532k(26123684k free)
CommandLine flags: -XX:InitialHeapSize=535654144 -XX:MaxHeapSize=4294967296 -XX:+PrintGC -XX:+PrintGCTimeStamps -XX:+UseCompressedClassPointers -XX:+UseCompressedOops -XX:-UseLargePagesIndividualAllocation -XX:+UseParallelGC 
0.171: [GC (Allocation Failure)  131584K->129070K(502784K), 0.0919490 secs]
0.370: [GC (Allocation Failure)  260654K->261166K(634368K), 0.4433150 secs]
0.813: [Full GC (Ergonomics)     261166K->260351K(899072K), 1.4135289 secs]
2.460: [GC (Allocation Failure)  523519K->524487K(899072K), 0.7901689 secs]
3.250: [Full GC (Ergonomics)     524487K->523517K(1421312K), 2.3177831 secs]

Because the second run allocated so much more memory initially, the GC runs get slower. That apparently outweighs the extra array copying going on when the list resizes.

Using `-Xms2g` to increase the starting memory size speeds it up tremendously with or without large initial capacity. It also makes the real time ~= user cpu time (no GC threads going nuts). No extra JVM args: real=5.5s, user=12s, sys=3.3s. With `-Xms2g`: real=1.2s, user=1.6s, sys=0.6s. Intel SnB i5-2500k (3.8GHz turbo), dual channel DDR3 1600MHz, Linux 3.19, `OpenJDK 64-Bit Server VM (build 25.45-b02, mixed mode)`. — Peter Cordes, Aug 31 '15 at 03:33
@PeterCordes Yeah, with `-Xms` you're preallocating the memory, eliminating the `GC (Allocation Failure)` runs, likely eliminating all GC runs. This shows the fallacy of small tests, where the VM hasn't "warmed up". --- This is what `apangin`'s comment (to question) was about. — Andreas, Aug 31 '15 at 03:36
Why does the GC care about the amount of memory? Shouldn't it care about objects only? It has to check the same number of objects in both cases, so it shouldn't be slower. — Thomas Weller, Aug 31 '15 at 09:13
@Thomas The GC will need to check all life *references*, thats not identical to the number of *object instances*. There are no backward pointers from an object to all its referrers, so GC time is also dependent on the number of life references. — Durandal, Aug 31 '15 at 17:56

score 2 · Answer 2 · answered Aug 31 '15 at 09:11

List randNums with 10 million elements requires about 700MB Memory Space. In order to avoid the effects of GC(For sure, it means much in this test), I set the Hotspot VM arguments like this:

-XX:+PrintGC
-XX:+PrintGCTimeStamps
-XX:+UseParNewGC
-XX:+UseConcMarkSweepGC
-Xmx1000m
-Xms1000m
-Xmn999m
-XX:SurvivorRatio=65535

make Young generation large enough to save all elements and make no GC during elements allocation. I make Eden Region of Young Generation larger to avoid elements copy within Young Generation.
The result is surprising!. Total execution time decreases from 8s to 0.6s!

Here, I did some extra work for the questioner, that is testing whether or not pre-allocation of ArrayList can save time and how much it helps.
Here is my code:

        long startTime;
        List<String> randNums;
        Random r = new Random(1);

        System.out.println("-----------------------------ArrayList With Enough Capacity Allocated:----------");
        for(int loop=0;loop<5;loop++) {
            startTime = System.currentTimeMillis();
            randNums = new ArrayList<String>(SIZE);
            for (int i = 0; i <SIZE ; i++) {
                int randomNum = r.nextInt();
                randNums.add(Integer.toString(randomNum));
            }
            System.out.println(System.currentTimeMillis() - startTime); //print time of this loop
            randNums.clear();
            System.gc();
        }

        System.out.println("\n-----------------------------ArrayList Auto-Capacity:----------");
        for(int loop=0;loop<5;loop++) {
            startTime = System.currentTimeMillis();
            randNums = new ArrayList<String>();
            for (int i = 0; i <SIZE ; i++) {
                int randomNum = r.nextInt();
                randNums.add(Integer.toString(randomNum));
            }
            System.out.println(System.currentTimeMillis() - startTime); //print time of this loop
            randNums.clear();
            System.gc();
        }

The output is:

-----------------------------ArrayList With Enough Capacity Allocated:----------
625
0.712: [Full GC (System.gc())  714143K->39628K(1023936K), 0.1450449 secs]
0.863: [GC (CMS Initial Mark)  98268K(1023936K), 0.0549729 secs]
545
1.413: [Full GC (System.gc())  705185K->564K(1023936K), 0.1239084 secs]
483
2.031: [Full GC (System.gc())  679570K->564K(1023936K), 0.1256323 secs]
2.160: [GC (CMS Initial Mark)  59357K(1023936K), 0.0274108 secs]
523
2.688: [Full GC (System.gc())  670987K->564K(1023936K), 0.1222910 secs]
482
3.302: [Full GC (System.gc())  673223K->564K(1023936K), 0.1299938 secs]

-----------------------------ArrayList Auto-Capacity:----------
3.432: [GC (CMS Initial Mark)  40961K(1023936K), 0.0003740 secs]
3.907: [GC (CMS Final Remark)  698381K(1023936K), 0.1091347 secs]
796
4.240: [Full GC (System.gc())  911984K->56183K(1023936K), 0.1719540 secs]
4.412: [GC (CMS Initial Mark)  56183K(1023936K), 0.0394210 secs]
4.770: [GC (CMS Final Remark)  528894K(1023936K), 0.0726012 secs]
690
5.111: [Full GC (System.gc())  893818K->2060K(1023936K), 0.1364215 secs]
5.248: [GC (CMS Initial Mark)  20769K(1023936K), 0.0008902 secs]
5.750: [GC (CMS Final Remark)  694113K(1023936K), 0.1124856 secs]
704
5.962: [Full GC (System.gc())  808646K->2081K(1023936K), 0.1338328 secs]
6.096: [GC (CMS Initial Mark)  21137K(1023936K), 0.0010118 secs]
6.599: [GC (CMS Final Remark)  688155K(1023936K), 0.0899929 secs]
661
6.767: [Full GC (System.gc())  810872K->2081K(1023936K), 0.1287272 secs]
6.896: [GC (CMS Initial Mark)  21512K(1023936K), 0.0010619 secs]
7.398: [GC (CMS Final Remark)  691216K(1023936K), 0.1083076 secs]
681
7.586: [Full GC (System.gc())  803590K->2081K(1023936K), 0.1269813 secs]
7.714: [GC (CMS Initial Mark)  2081K(1023936K), 0.0008965 secs]

Striping GC info, it is:

-----------------------------ArrayList With Enough Capacity Allocated:----------
615
540
480
511
480

-----------------------------ArrayList Auto-Capacity:----------
680
708
640
644
663

We uses last three data of each group calculate the optimization(to avoid JIT and VM optimization). The answer comes like this:

(480+511+511)/(640+644+663) = 1502/1947 = 501/639 = 100/128

Why is this slower when giving ArrayList an initial capacity?

2 Answers2