CPython List Implementation Specifics

Question

I know a similar question is answered here: How is Python's List Implemented? but I would like to ask more about the specifics. I would like to know more about how CPython implements list resizing. I am not too familiar with C, so I have trouble reading the source code.

What I think I understand is that there is the size of the list Py_ssize_t ob_size and the amount allocated to the list Py_ssize_t allocated, and when the ob_size reaches allocated, then more memory needs to be allocated. I'm assuming that if the system allows it, the memory will be allocated in place, otherwise the list will have the be copied to another place in memory. In particular, I'm asking about the choice of how much to change allocated by. From listobject.c, the new allocated memory is the following:

new_allocated = (size_t)newsize + (newsize >> 3) + (newsize < 9 ? 3 : 6);

Essentially, we make allocated about 1/8 more than the desired object size (ignoring the constant). I wanted to know why this 1/8 is chosen? In my intro coding class I remember learning about ArrayLists which doubled in size when it was full. And perhaps increasing by 1/2 or 1/4 could have been chosen as well. The smaller the increase, the worse the amortized time from a long sequence of appends (still constant but with a larger factor), so 1/8 seems like a poor choice. My guess would be that allocating a small amount each time will increase the chance of being able to reallocate in place. Is this correct reasoning? Does this CPython implementation work well in practice?

Note: when decreasing the allocated memory of the list after removing elements, this occurs when the list has dropped to half the original size as can be seen from this part of the code:

/* Bypass realloc() when a previous overallocation is large enough to accommodate the newsize. If the newsize falls lower than half the allocated size, then proceed with the realloc() to shrink the list. */
if (allocated >= newsize && newsize >= (allocated >> 1)) {

Doubling the allocation each time is perhaps optimal from a Big-O standpoint - but in the real world, that can waste literal *gigabytes* of memory. — jasonharper, Oct 26 '21 at 16:39
Here is an interesting, recent discussion on the strategy: https://bugs.python.org/issue38373 — juanpa.arrivillaga, Oct 26 '21 at 16:57
A quote from that discussion: "We should definitely revisit the over-allocation strategy. I last worked on the existing strategy back in 2004. Since then, the weighting of the speed/space trade-off considerations have changed." — Chris Z, Oct 27 '21 at 19:21

score 4 · Accepted Answer · answered Oct 26 '21 at 16:36

Well, based on the 21-year-old commit that implemented that behavior, the why is "because it improved memory behavior on Tim Peters's Win98 box". Reproducing Tim's comments from that commit below.

Accurate timings are impossible on my Win98SE box, but this is obviously faster even on this box for reasonable list.append() cases. I give credit for this not to the resizing strategy but to getting rid of integer multiplication and divsion (in favor of shifting) when computing the rounded-up size.

For unreasonable list.append() cases, Win98SE now displays linear behavior for one-at-time appends up to a list with about 35 million elements. Then it dies with a MemoryError, due to fatally fragmented address space (there's plenty of VM available, but by this point Win9X has broken user space into many distinct heaps none of which has enough contiguous space left to resize the list, and for whatever reason Win9x isn't coalescing the dead heaps). Before the patch it got a MemoryError for the same reason, but once the list reached about 2 million elements.

Haven't yet tried on Win2K but have high hopes extreme list.append() will be much better behaved now (NT & Win2K didn't fragment address space, but suffered obvious quadratic-time behavior before as lists got large).

For other systems I'm relying on common sense: replacing integer * and / by << and >> can't plausibly hurt, the number of function calls hasn't changed, and the total operation count for reasonably small lists is about the same (while the operations are cheaper now).

...

This over-allocates proportional to the list size, making room for additional growth. The over-allocation is mild, but is enough to give linear-time amortized behavior over a long sequence of appends() in the presence of a poorly-performing system realloc() (which is a reality, e.g., across all flavors of Windows, with Win9x behavior being particularly bad -- and we've still got address space fragmentation problems on Win9x even with this scheme, although it requires much longer lists to provoke them than it used to).

The values were further tuned by Raymond Hettinger in this commit:

The Py2.3 approach overallocated small lists by up to 8 elements. The last checkin would limited this to one but slowed down (by 20 to 30%) the creation of small lists between 3 to 8 elements.

This tune-up balances the two, limiting overallocation to 3 elements (significantly reducing space consumption from Py2.3) and running faster than the previous checkin.

The first part of the growth pattern (0, 4, 8, 16) neatly meshes with allocators that trigger data movement only when crossing a power of two boundary. Also, then even numbers mesh well with common data alignments.

CPython List Implementation Specifics

1 Answers1