4

Comparing creating large ArrayList with intialCapacity I found that it;s slower than creting it without one. Here is the simple program I wrote to measure it:

long start2 = System.nanoTime();
List<Double> col = new ArrayList<>(30000000); // <--- Here
for (int i = 0; i < 30000000; i++) {
    col.add(Math.sqrt(i + 1));
}
long end2 = System.nanoTime();
System.out.println(end2 - start2);
System.out.println(col.get(12411325).hashCode() == System.nanoTime());

The average result for new ArrayList<>(30000000): 6121173329

The average result for new ArrayList<>(): 4883894100

on my machine. I thought that it would be faster to create large array once rather than reacreating it once we go beyond the capacity of the current underlying array of ArrayList. Eventually we should have ended up with array size greater or equal than 30000000.

I thought it was optimization, but actualy pessimization. Why that?

user3663882
  • 6,957
  • 10
  • 51
  • 92
  • @ManoDestra So, what is the purpose of the initial capacity, when to use it? – user3663882 Aug 05 '16 at 16:14
  • 3
    When you add the `30,000,000`th element, the `List` expands to `60,000,000` elements. Try increasing the initial capacity by `1` (but do not add another number to the `List`). – Elliott Frisch Aug 05 '16 at 16:25
  • 2
    @ManoDestra How comes to is "Way to high", if it is _exactly_ the size of the List after the `for` lopp? I would surprise me if the OPs findings are actually true, because without the initial capacity set, the ArrayList will have to reallocate and copy itself for `ceil(log2(30000000/10)) = 22` times and in the end have a internal capacity of `41943040`, more than 30% too much. – AlexR Aug 05 '16 at 16:26
  • 1
    @ElliottFrisch If you can dig up the source code where that is written, this will be an excellent answer! – AlexR Aug 05 '16 at 16:27
  • @ElliottFrisch are you sure? I just tested your suggestion and got the same results: the 2nd variation took only 2/3 of the time. However when I use `Integer` and do a `col.add(i);`, then the first variation is slightly faster! – maraca Aug 05 '16 at 16:28
  • @ElliottFrisch Have tried it, and the result was almost the same. – user3663882 Aug 05 '16 at 16:32
  • How are you actually timing this? Just once? Averaged over multiple runs? Different order? Ramp-up time? – copeg Aug 05 '16 at 16:32
  • @copeg I ran the same program multiple times. It was not in a loop. – user3663882 Aug 05 '16 at 16:33
  • If I run this averaged over multiple runs together with an initial ramp-up time setting the initial capacity is faster time and again – copeg Aug 05 '16 at 16:36
  • All sources I could find claim that the capacity is only increased when adding an element would lead to an overflow and not before. About the size I read it's * 3 / 2 + 1 and not double exactly, but it's not specified in the API. – maraca Aug 05 '16 at 16:52
  • 3
    if you're doing a small benchmark, I suggest reading through this question: http://stackoverflow.com/questions/504103/how-do-i-write-a-correct-micro-benchmark-in-java – Jeutnarg Aug 05 '16 at 16:53
  • If you think about it, the capacity can't be doubled, because an initial capacity of 0 is allowed. Therefore I think the given formula is probably correct, although the API only states that the default initial capacity is 10 and says nothing about the growth of the ArrayList. – maraca Aug 05 '16 at 17:17
  • @maraca Thanks. Left brain at home this morning :D – ManoDestra Aug 05 '16 at 19:52

1 Answers1

6

I ran the same program multiple times. It was not in a loop

Consider how you are profiling the code - if you include both a 'ramp up time' (to take into account things such as JIT) and average over several calls (to gather some statistics/distribution), the timing may lead you to a different conclusion. For example:

public static void main(String[] args){
    //Warm up
    System.out.println("Warm up");
    for ( int i = 0; i < 5; i++ ){
        dynamic();
        constant();
    }
    System.out.println("Timing...");
    //time
    long e = 0;
    long s = 0; 
    int total = 5;
    for ( int i = 0; i < total; i++ ){
        long e1 = dynamic();
        System.out.print(e1 + "\t");
        e += e1;
        long s1 = constant();
        System.out.println(s1);
        s += s1;
    }
    System.out.println("Static Avg: " + (s/total));
    System.out.println("Dynamic Avg: " + (e/total));

}

private static long dynamic(){
    long start2 = System.currentTimeMillis();
    List<Double> col = new ArrayList<>();
    for (int i = 0; i < 30000000; i++) {
        col.add(Math.sqrt(i + 1));
    }
    long end2 = System.currentTimeMillis();
    return end2 - start2;
}

private static long constant(){
    long start2 = System.currentTimeMillis();
    List<Double> col = new ArrayList<>(30000000); 
    for (int i = 0; i < 30000000; i++) {
        col.add(Math.sqrt(i + 1));
    }
    long end2 = System.currentTimeMillis();
    return end2 - start2;
}

On my system setting the initial capacity is always faster, though not by any orders of magnitude.

Edit: As suggested in a comment, consider reading through How do I write a correct micro-benchmark in Java?

Community
  • 1
  • 1
copeg
  • 8,290
  • 19
  • 28
  • 1
    My results: Static Avg: 7544; Dynamic Avg: 2379 !? – maraca Aug 05 '16 at 17:01
  • Consider a) not just the mean but the distribution b) the rules in the link in edit (my code doesn't necessarily abide by all of them) – copeg Aug 05 '16 at 17:17