161

The usual constructor of ArrayList is:

ArrayList<?> list = new ArrayList<>();

But there is also an overloaded constructor with a parameter for its initial capacity:

ArrayList<?> list = new ArrayList<>(20);

Why is it useful to create an ArrayList with an initial capacity when we can append to it as we please?

Rob
  • 15,732
  • 22
  • 69
  • 107
  • 17
    Have you tried to see the ArrayList source code? – AmitG Mar 15 '13 at 10:45
  • @Joachim Sauer: Sometime we get a cognizance when we read source carefully. I was giving a try if he has read the source. I understood your aspect. Thanks. – AmitG Mar 15 '13 at 13:11
  • ArrayList is poor performing period, why would you want to use such a structure – PositiveGuy Mar 30 '13 at 04:29
  • 1
    possible duplicate of [what's meant by parameter (int initial capacity) in an arraylist](http://stackoverflow.com/questions/4172480/whats-meant-by-parameter-int-initial-capacity-in-an-arraylist) – Patrick May 13 '13 at 11:23

11 Answers11

207

If you know in advance what the size of the ArrayList is going to be, it is more efficient to specify the initial capacity. If you don't do this, the internal array will have to be repeatedly reallocated as the list grows.

The larger the final list, the more time you save by avoiding the reallocations.

That said, even without pre-allocation, inserting n elements at the back of an ArrayList is guaranteed to take total O(n) time. In other words, appending an element is an amortized constant-time operation. This is achieved by having each reallocation increase the size of the array exponentially, typically by a factor of 1.5. With this approach, the total number of operations can be shown to be O(n).

NPE
  • 486,780
  • 108
  • 951
  • 1,012
  • 5
    While pre-allocating known sizes is a good idea, not doing it is usually not terrible: you will need about *log(n)* re-allocations for a list with a final size of *n*, which is not a lot. – Joachim Sauer Mar 15 '13 at 11:48
  • 1
    So why isn't it `O(nlogn)`, since it's reallocated at every power of 1.5 (more or less)? – Peter Olson Mar 15 '13 at 15:43
  • 3
    @PeterOlson `O(n log n)` would be doing `log n` work `n` times. That's a gross overestimate (though *technically* correct with big O due to it being an upper bound). It copies s + s*1.5 + s*1.5^2 + ... + s*1.5^m (such that s*1.5^m < n < s*1.5^(m+1)) elements in total. I'm no good at sums so I can't give you the precise math off the top of my head (for resizing factor 2, it's 2n, so it may be 1.5n give or take a small constant), but it doesn't take too much squinting to see that this sum is at most a constant factor larger than n. So it takes O(k*n) copies, which is of course O(n). –  Mar 15 '13 at 16:03
  • @NPE Pah, that's cheating ;-) –  Mar 15 '13 at 16:10
  • 1
    @delnan: Can't argue with that! ;) BTW, I really liked your squinting argument; will add it to my repertoire of tricks. – NPE Mar 15 '13 at 16:17
  • 6
    It's easier to do the argument with doubling. Suppose you double when full, starting with one element. Suppose you want to insert 8 elements. Insert one (cost: 1). Insert two -- double, copy one element and insert two (cost: 2). Insert three -- double, copy two elements, insert three (cost: 3). Insert four (cost: 1). Insert five -- double, copy four elements, insert five (cost: 5). Insert six, seven and eight (cost: 3). Total cost: 1 + 2 + 3 + 1 + 5 + 3 = 16, which is *twice* the number of elements inserted. From this sketch you can prove that the *average* cost is *two per insert* in general. – Eric Lippert Mar 15 '13 at 16:55
  • 9
    That's the cost *in time*. You can also see though that the amount of *wasted space* changed over time, being 0% some of the time and close to 100% some of the time. Changing the factor from 2 to 1.5 or 4 or 100 or whatever changes the average amount of wasted space and the average amount of time spent copying, but the time complexity remains linear on average no matter what the factor is. – Eric Lippert Mar 15 '13 at 16:58
  • For a very large list, does repeated reallocation to accommodate more entries have any material impact on heap space? – raffian Aug 26 '13 at 20:44
  • 1
    the initial capacity is REALLY important when dealing with large number of items. for instance, I had a bug in my ETL application which was related to this topic. I got it resolved by increasing this init capacity from 1000 to 10,000. it was making a huge trouble for me for months but I noticed that when number of processing files are exceeding 1000 my application panics. and interestingly, the issue totally recovered by increasing this parameter. thank you guys, – mostafa.S Apr 18 '15 at 09:29
  • agree with user395760, while i don't agree with @Joachim Sauer. –  Apr 22 '21 at 15:39
42

Because ArrayList is a dynamically resizing array data structure, which means it is implemented as an array with an initial (default) fixed size. When this gets filled up, the array will be extended to a double sized one. This operation is costly, so you want as few as possible.

So, if you know your upper bound is 20 items, then creating the array with initial length of 20 is better than using a default of, say, 15 and then resize it to 15*2 = 30 and use only 20 while wasting the cycles for the expansion.

P.S. - As AmitG says, the expansion factor is implementation specific (in this case (oldCapacity * 3)/2 + 1)

Iulius Curt
  • 4,984
  • 4
  • 31
  • 55
26

Default size of Arraylist is 10.

/**
 * Constructs an empty list with an initial capacity of ten.
 */
public ArrayList() {
    this(10);
}   

So if you are going to add 100 or more records, you can see the overhead of memory reallocation.

ArrayList<?> list = new ArrayList<>();    
// same as  new ArrayList<>(10);      

So if you have any idea about the number of elements which will be stored in Arraylist its better to create Arraylist with that size instead of starting with 10 and then going on increasing it.

Ajinkya
  • 22,324
  • 33
  • 110
  • 161
18

I actually wrote a blog post on the topic 2 months ago. The article is for C#'s List<T> but Java's ArrayList has a very similar implementation. Since ArrayList is implemented using a dynamic array, it increases in size on demand. So the reason for the capacity constructor is for optimisation purposes.

When one of these resizings operation occurs, the ArrayList copies the contents of the array into a new array that is twice the capacity of the old one. This operation runs in O(n) time.

Example

Here is an example of how the ArrayList would increase in size:

10
16
25
38
58
... 17 resizes ...
198578
297868
446803
670205
1005308

So the list starts with a capacity of 10, when the 11th item is added it is increase by 50% + 1 to 16. On the 17th item the ArrayList is increased again to 25 and so on. Now consider the example where we're creating a list where the desired capacity is already known as 1000000. Creating the ArrayList without the size constructor will call ArrayList.add 1000000 times which takes O(1) normally or O(n) on resize.

1000000 + 16 + 25 + ... + 670205 + 1005308 = 4015851 operations

Compare this using the constructor and then calling ArrayList.add which is guaranteed to run in O(1).

1000000 + 1000000 = 2000000 operations

Java vs C#

Java is as above, starting at 10 and increasing each resize at 50% + 1. C# starts at 4 and increases much more aggressively, doubling at each resize. The 1000000 adds example from above for C# uses 3097084 operations.

References

Daniel Imms
  • 47,944
  • 19
  • 150
  • 166
10

Setting the initial size of an ArrayList, e.g. to ArrayList<>(100), reduces the number of times the re-allocation of internal memory has to occur.

Example:

ArrayList example = new ArrayList<Integer>(3);
example.add(1); // size() == 1
example.add(2); // size() == 2, 
example.add(2); // size() == 3, example has been 'filled'
example.add(3); // size() == 4, example has been 'expanded' so that the fourth element can be added. 

As you see in the above example - an ArrayList can be expanded if needed to be. What this doesn't show you is that the size of the Arraylist usually doubles (although note that the new size depends on your implementation). The following is quoted from Oracle:

"Each ArrayList instance has a capacity. The capacity is the size of the array used to store the elements in the list. It is always at least as large as the list size. As elements are added to an ArrayList, its capacity grows automatically. The details of the growth policy are not specified beyond the fact that adding an element has constant amortized time cost."

Obviously, if you have no idea as to what kind of range you will be holding, setting the size probably won't be a good idea - however, if you do have a specific range in mind, setting an initial capacity will increase memory efficiency.

dsgriffin
  • 66,495
  • 17
  • 137
  • 137
3

ArrayList can contain many values and when doing large initial insertions you can tell ArrayList to allocate a larger storage to begin with as to not waste CPU cycles when it tries to allocate more space for the next item. Thus to allocate some space at the beginning is more effiecient.

Sanober Malik
  • 2,765
  • 23
  • 30
3

This is to avoid possible efforts for reallocation for every single object.

int newCapacity = (oldCapacity * 3)/2 + 1;

internally new Object[] is created.
JVM needs effort to create new Object[] when you add element in the arraylist. If you don't have above code(any algo you think) for reallocation then every time when you invoke arraylist.add() then new Object[] has to be created which is pointless and we are loosing time for increasing size by 1 for each and every objects to be added. So it is better to increase size of Object[] with following formula.
(JSL has used forcasting formula given below for dynamically growing arraylist instead of growing by 1 every time. Because to grow it takes effort by JVM)

int newCapacity = (oldCapacity * 3)/2 + 1;
AmitG
  • 10,365
  • 5
  • 31
  • 52
  • ArrayList will _not_ perform reallocation for each single `add` - it already uses some growth formula internally. Hence the question is not answered. – A.H. Mar 15 '13 at 12:27
  • @A.H. My answer is for **negative testing**. Kindly read between the lines. I said _"If you don't have above code(any algo you think) for reallocation then every time when you invoke arraylist.add() then new Object[] has to be created which is pointless and we are loosing time."_ and the **code** is `int newCapacity = (oldCapacity * 3)/2 + 1;` which is present in ArrayList class. Do you still think it is unanswered? – AmitG Mar 15 '13 at 12:32
  • 1
    I still think it is not answered: In `ArrayList` the amortized reallocation takes place in _any_ case with _any_ value for the initial capacity. And the question is about: Why use a non-standard value for the initial capacity at all? Besides of this: "reading between the lines" is not something desired in a technical answer. ;-) – A.H. Mar 15 '13 at 12:59
  • @A.H. I am answering like, what had happened if we wouldn't have reallocation process in ArrayList. So is the answer. Try to read spirit of the answer :-). I better know _In ArrayList the amortized reallocation takes place in any case with any value for the initial capacity._ – AmitG Mar 15 '13 at 13:08
2

I think each ArrayList is created with an init capacity value of "10". So anyway, if you create an ArrayList without setting capacity within constructor it will be created with a default value.

sk2212
  • 1,688
  • 4
  • 23
  • 43
2

I'd say its an optimization. ArrayList without initial capacity will have ~10 empty rows and will expand when you are doing an add.

To have a list with exactly the number of items you need to call trimToSize()

Daniel Magnusson
  • 9,541
  • 2
  • 38
  • 43
0

As per my experience with ArrayList, giving an initial capacity is a nice way to avoid reallocation costs. But it bears a caveat. All suggestions mentioned above say that one should provide initial capacity only when a rough estimate of the number of elements is known. But when we try to give an initial capacity without any idea, the amount of memory reserved and unused will be a waste as it may never be required once the list is filled to required number of elements. What i am saying is, we can be pragmatic at the beginning while allocating capacity, and then find a smart way of knowing required minimal capacity at runtime. ArrayList provides a method called ensureCapacity(int minCapacity). But then, one has find a smart way...

-1

I have tested ArrayList with and without initialCapacity and I got suprising result
When I set LOOP_NUMBER to 100,000 or less the result is that setting initialCapacity is efficient.

list1Sttop-list1Start = 14
list2Sttop-list2Start = 10


But when I set LOOP_NUMBER to 1,000,000 the result changes to:

list1Stop-list1Start = 40
list2Stop-list2Start = 66


Finally, I couldn't figure out how does it works?!
Sample code:

 public static final int LOOP_NUMBER = 100000;

public static void main(String[] args) {

    long list1Start = System.currentTimeMillis();
    List<Integer> list1 = new ArrayList();
    for (int i = 0; i < LOOP_NUMBER; i++) {
        list1.add(i);
    }
    long list1Stop = System.currentTimeMillis();
    System.out.println("list1Stop-list1Start = " + String.valueOf(list1Stop - list1Start));

    long list2Start = System.currentTimeMillis();
    List<Integer> list2 = new ArrayList(LOOP_NUMBER);
    for (int i = 0; i < LOOP_NUMBER; i++) {
        list2.add(i);
    }
    long list2Stop = System.currentTimeMillis();
    System.out.println("list2Stop-list2Start = " + String.valueOf(list2Stop - list2Start));
}

I have tested on windows8.1 and jdk1.7.0_80

Hamedz
  • 726
  • 15
  • 27
  • 2
    hi, unfortunately currentTimeMillis tolerance is of up hundred milliseconds (depending), meaning that the result is hardly reliable. I'd suggest to use some custom library to do it right. – Bogdan Mar 20 '18 at 10:11
  • 1
    See also: [How do I write a correct micro-benchmark in Java?](https://stackoverflow.com/questions/504103) – Stephen C May 18 '22 at 02:07