I know a similar question is answered here: How is Python's List Implemented? but I would like to ask more about the specifics. I would like to know more about how CPython implements list resizing. I am not too familiar with C, so I have trouble reading the source code.
What I think I understand is that there is the size of the list Py_ssize_t ob_size
and the amount allocated to the list Py_ssize_t allocated
, and when the ob_size
reaches allocated
, then more memory needs to be allocated. I'm assuming that if the system allows it, the memory will be allocated in place, otherwise the list will have the be copied to another place in memory. In particular, I'm asking about the choice of how much to change allocated
by. From listobject.c
, the new allocated memory is the following:
new_allocated = (size_t)newsize + (newsize >> 3) + (newsize < 9 ? 3 : 6);
Essentially, we make allocated about 1/8 more than the desired object size (ignoring the constant). I wanted to know why this 1/8 is chosen? In my intro coding class I remember learning about ArrayLists which doubled in size when it was full. And perhaps increasing by 1/2 or 1/4 could have been chosen as well. The smaller the increase, the worse the amortized time from a long sequence of appends (still constant but with a larger factor), so 1/8 seems like a poor choice. My guess would be that allocating a small amount each time will increase the chance of being able to reallocate in place. Is this correct reasoning? Does this CPython implementation work well in practice?
Note: when decreasing the allocated memory of the list after removing elements, this occurs when the list has dropped to half the original size as can be seen from this part of the code:
/* Bypass realloc() when a previous overallocation is large enough to accommodate the newsize. If the newsize falls lower than half the allocated size, then proceed with the realloc() to shrink the list. */
if (allocated >= newsize && newsize >= (allocated >> 1)) {