How to define the concept of capacity in ArrayLists?

Question

I understand that capacity is the number of elements or available spaces in an ArrayList that may or may not hold a value referencing an object. I am trying to understand more about the concept of capacity.

So I have three questions:

1) What are some good ways to define what capacity represents from a memory standpoint?

...the (contiguous?) memory allocated to the ArrayList?

...the ArrayLists’s memory footprint on the (heap?)?

2) Then if the above is true, changing capacity requires some manner of memory management overhead?

3) Anyone have an example where #2 was or could be a performance concern? Aside from maybe a large number of large ArrayLists having their capacities continually adjusted?

If you knew C you would do things like memory allocation, but yes they reallocate memory as it dynamically grows (by some factor i think 1.75 or something) It's always better to instantiate with knowing the size otherwise as you said in 3 when you add lots of data it will allocate a lot of empty space. I will let people who know a lot more about the memory allocation and how Java does it provide the real answers — Mike, Mar 23 '11 at 20:23
*(by some factor i think 1.75 or something)* the formula is `int newCapacity = (oldCapacity * 3)/2 + 1;` — bestsss, Mar 23 '11 at 20:27

Michael Borgwardt · Accepted Answer · 2011-03-24T18:30:45.367

The class is called ArrayList because it's based on an array. The capacity is the size of the array, which requires a block of contiguous heap memory. However, note that the array itself contains only references to the elements, which are separate objects on the heap.
Increasing the capacity requires allocating a new, larger array and copying all the references from the old array to the new one, after which the old one becomes eligible for garbage collection.
You've cited the main case where performance could be a concern. In practice, I've never seen it actually become a problem, since the element objects usually take up much more memory (and possibly CPU time) than the list.

score 2 · Answer 2 · answered Mar 23 '11 at 20:30

ArrayList is implemented like this:

class ArrayList {
  private Object[] elements;
}

the capacity is the size of that array.

Now, if your capacity is 10, and you're adding 11-th element, ArrayList will do this:

Object[] newElements = new Object[capacity * 1.5];
System.arraycopy(this.elements, newElements);
this.elements = newElements;

So if you start off with a small capacity, ArrayList will end up creating a bunch of arrays and copying stuff around for you as you keep adding elements, which isn't good.

On the other hand, if you specify a capacity of 1,000,000 and add only 3 elements to ArrayList, that also is kinda bad.

Rule of thumb: if you know the capacity, specify it. If you aren't sure but know the upper bound, specify that. If you just aren't sure, use the defaults.

score 1 · Answer 3 · answered Mar 23 '11 at 20:33

1

Capacity is as you described it -- the contiguous memory allocated to an ArrayList for storage of values. ArrayList stores all values in an array, and automatically resizes the array for you. This incurs memory management overhead when resizing.

If I remember correctly, Java increases the size of an ArrayList's backing array from size N to size 2N + 2 when you try to add one more element than the capacity can take. I do not know what size it increases to when you use the insert method (or similar) to insert at a specific position beyond the end of the capacity, or even whether it allows this.

Here is an example to help you think about how it works. Picture each space between the |s as a cell in the backing array:

| | |

size = 0 (contains no elements), capacity = 2 (can contain 2 elements).

|1| |

size = 1 (contains 1 element), capacity = 2 (can contain 2 elements).

|1|2|

size = 2, capacity = 2. Adding another element:

|1|2|3| | | |

size increased by 1, capacity increased to 6 (2 * 2 + 2). This can be expensive with large arrays, as allocating a large contiguous memory region can require a bit of work (as opposed to a LinkedList, which allocates many small pieces of memory) because the JVM needs to search for an appropriate location, and may need to ask the OS for more memory. It is also expensive to copy a large number of values from one place to another, which would be done once such a region was found.

My rule of thumb is this: If you know the capacity you will require, use an ArrayList because there will only be one allocation and access is very fast. If you do not know your required capacity, use a LinkedList because adding a new value always takes the same amount of work, and there is no copying involved.

answered Mar 23 '11 at 20:33

Jonathan

13,354
4
36
32

Because each new element of a linked list creates a new list node, adding elements to such a list has a constant overhead, which is in average likely bigger than adding an element to the end of an array list, even including the regrowing arrays (since these occur only seldom). I would use a linked list only if I want to add elements in front (or at an iterator-found position), or if you really need guaranteed O(1) for your additions, not only amortized O(1). – Paŭlo Ebermann Mar 23 '11 at 21:32
@Paŭlo True, but I still prefer them. And if you want a queue, they are really the only way to go. – Jonathan Mar 24 '11 at 13:21
@Paŭlo Ebermann ...You know, i just read the opposite in Murach's Java SE6. Murach states that because LinkedLists don't use arrays, inserting an element in a LinkedList can be more efficient since only the preceding and following pointers need to be changed vs. creating a whole new array as an ArrayList does.Maybe you were thinking of accessing an element which requires a lot of overhead since each element's pointers have to be stepped though to get to your destination. – Cephi Apr 01 '11 at 15:23
@Cephi: Of course, if you have an ArrayList at it's capacity (i.e. no more free space in the internal array), adding one more element takes more time than it would in a linked list. But if you add `n` elements to such a list (initially being empty), such a resizing and copying occurs only `O(log(n))` times (and copying less than `2·n` array entries), while you still would need to create `n` new list node objects and set `3·n` pointers. (This is for adding elements at the end, of course.) – Paŭlo Ebermann Apr 01 '11 at 15:29
@Cephi: If adding to the list at the beginning, the linked list is at a clear advantage, since in the ArrayList for each element we have to shift all following elements by one to the back, while in the linked-list case we just have to create one node object and set/change `4` pointers. – Paŭlo Ebermann Apr 01 '11 at 15:32
@Cephi: For adding in the middle, it balances out: In a linked list we first have to search the right node, iterating the list iterator, which takes up to `n/2` link-traversals. The adding itself does only cost the same as everytime. In a arraylist it takes no time to find the right position, but we have to copy all the following elements to shift them. – Paŭlo Ebermann Apr 01 '11 at 15:34
For adding/removing everywhere quite fast (and index-based), we would need a tree-based list, like the TreeList in Apache commons (which needs O(log n) for every type of one-element access (adding, retrieving, removing). – Paŭlo Ebermann Apr 01 '11 at 15:43
@Paŭlo Ebermann Wow you are fast. I was just just checking another book and it said what i think you were getting at: There is a memory overhead associated with the pointers of each element in a LinkedList. This may be greater than the memory used by an ArrayList unless it has an overly large capacity set. I'll edit my comment above so it does not sound like you were wrong. – Cephi Apr 01 '11 at 15:52
Memory overhead, and also a processing time overhead. But as always, measure it if you think it is the bottleneck of your application, and only then think about changing it :-) – Paŭlo Ebermann Apr 01 '11 at 16:06

score 0 · Answer 4 · edited Jun 20 '20 at 09:12

1) What are some good ways to define what capacity represents from a memory standpoint?

...the (contiguous?) memory allocated to the ArrayList?

Yes, an ArrayList is backed up by an array, to that represents the internal array size.

...the ArrayLists’s memory footprint on the (heap?)?

Yes, the larget the array capacity, the more footprint used by the arraylist.

2) Then if the above is true, changing capacity requires some manner of memory management overhead?

It is. When the list grows large enough, a larger array is allocated and the contents copied. The previous array maybe discarded and marked for garbage collection.

3) Anyone have an example where #2 was or could be a performance concern? Aside from maybe a large number of large ArrayLists having their capacities continually adjusted?

Yes, if you create the ArrayList with initial capacity of 1 ( for instance ) and your list grows way beyond that. If you know upfront the number of elements to store, you better request an initial capacity of that size.

However I think this should be low in your list of priorities, while array copy may happen very often, it is optimized since the early stages of Java, and should not be a concern. Better would be to choose a right algorithm, I think. Remember: Premature optimization is the root of all evil

See also: When to use LinkedList over ArrayList

How to define the concept of capacity in ArrayLists?

4 Answers4

Linked

Related