12

For example, ArrayLists in Java have a resizing factor of 2. When the array that the ArrayList is wrapped around runs out of space, all the elements of that array are transferred to a new array that is 2 times the size of the original array.

Since Python lists/arrays are naturally dynamic, what are their resizing factor? Or do they use some other method of scaling? If so, what's that method? What's its asymptotic runtime (big O)?

TomLisankie
  • 3,785
  • 7
  • 28
  • 32
  • I've described CPython's 12.5% list reallocation over in: [Where does a Python list hold its values?](https://stackoverflow.com/q/37494021/674039) – wim Sep 06 '18 at 02:31
  • Java ArrayList's resize factor is `1.5` not `2`: https://stackoverflow.com/questions/4450628/arraylist-how-does-the-size-increase/52014409 – maplemaple May 17 '21 at 23:06

2 Answers2

9

For CPython, in particular, the overallocation during resize is implemented in objects/listobject.c. It is described as

mild, but ... enough to give linear-time amortized behavior over a long sequence of appends() in the presence of a poorly-performing system realloc().

When the list needs to be grown, it is grown to be approximately 112.5% of the necessary size. In particular, the actual size is new_size + new_size >> 3 + (newsize < 9 ? 3 : 6).

nanofarad
  • 40,330
  • 4
  • 86
  • 117
  • 12
    Note: This is strictly an implementation detail of the current version of CPython (the reference interpreter). The overallocation rules have changed over time, can continue to change in the future, and likely differ on every other non-CPython interpreter (e.g. Jython might reasonably use `ArrayList` under the hood and inherit its behavior). No documented language feature requires any specific implementation. Relying on a specific resizing algorithm is a bad idea. – ShadowRanger Sep 06 '18 at 02:32
  • Still, this answer is very useful for conceptual understanding and complexity analysis, if nothing else. – flow2k Nov 29 '20 at 03:10
  • @ShadowRanger Where does one find the doc for this behavior in their favorite version of the current installed Python? How will I even know exactly which version I have? – Gulzar Jun 25 '21 at 11:03
  • 1
    @Gulzar: You don't. It's mentioned in the source code itself, and maybe the bug tracker entries that set it. It's not something important to the language itself, just an implementation detail, and documenting it would tie their hands, without providing any useful benefit. CPython is the reference interpreter, so unless you went out of your way to install some other interpreter, it's what you've got. – ShadowRanger Jun 25 '21 at 11:11
2

To answer your other question: adding an item to the end of an ArrayList or similar takes amortized O(1) time as long as there is any factor k > 1 such that reallocation always makes an array that is k times as big as the previous one.

The actual factor used doesn't really matter as long as it's bigger than one. Alternative reallocation schemes might have a different complexity. If reallocation adds a constant amount, for example, then adding an item takes amortized O(N) time.

That's why all professionally written dynamically growing arrays grow by some factor each time. The factor provides a fixed upper bound on the proportion of wasted space ((k-1)/k), traded off against a fixed upper bound on the average number of times each item could be copied in a sequence of adds.

Lets say you're using factor k and you've just performed the Nth reallocation. That means you have added about k^N items. How many items have you copied? Let the be C. Then:

C = k^(N-1) + k(N-2) + ... + 1

kC - C = k^N + k^(N-1) - k^(N-1) + k^(N-2) - k^(N-2) ... - 1

kC - C = k^N - 1

C = (k^N-1) / (k-1)

The number of copies per item added, then is C/K^N, which is pretty much 1/(k-1).

So Java's factor of k=2 means there is about one copy per add operation, with up to 50% unused slots, and CPython's factor of k=1.125 means there are about 8 copies per add operation with up to 11% unused slots.

Matt Timmermans
  • 53,709
  • 3
  • 46
  • 87