7

I don't understand how sizeof behaves differently when both the lists are created like literals. I expect the output of the second sizeof to be equal or less than that of the first sizeof, not greater!

>>> sys.getsizeof([0,1,2,3,4,5,6])
120
>>> sys.getsizeof([0,1,2,3,4,5])
152
Tushar Vazirani
  • 1,011
  • 13
  • 14
  • In my machine, it is showing ```120``` and ```112``` – ksohan Apr 05 '22 at 20:16
  • 2
    While this is interesting, in general, you shouldn't expect `len(mylist1) < len(mylist2)` to imply `sys.getsizeof(mylist1) < sys.getsizeof(mylist2)`, since the *way* a list is constructed can influence `sys.getsizeof` – juanpa.arrivillaga Apr 05 '22 at 20:20
  • Or, for example, consider `sys.getsizeof(5*[None])` versus `sys.getsizeof([None, None, None, None, None])` – juanpa.arrivillaga Apr 05 '22 at 20:21
  • 1
    I think the only way to answer this is to look at the generated byte code. – Mark Ransom Apr 05 '22 at 20:22
  • 1
    in this case, the literal's would be constucted the same way, basically, the code keeps cached constant tuple, and an empty list is created and `extend` is called on that tuple, see the `dis.dis` output. So it's interesting that they are different. Not sure why. There has probably been some tweak to the way the growth of lists behaves with smaller lists – juanpa.arrivillaga Apr 05 '22 at 20:24
  • 2
    @MarkRansom nah. The generated bytecode is practically the same, this all comes down to the details of the internal list re-sizing logic. – juanpa.arrivillaga Apr 05 '22 at 20:24
  • 1
    @juanpa.arrivillaga the expectation is that the size of a literal list is known by Python so no resizing is necessary. That's obviously not happening. – Mark Ransom Apr 05 '22 at 20:26
  • @MarkRansom yes, as I explained, if you look at the bytecode, the essentially it creates an empty list, loads a constant tuple, either `(0,1,2,3,4,5)` or `(0,1,2,3,4,5,6)`, then uses list.extend on that tuple – juanpa.arrivillaga Apr 05 '22 at 20:29
  • so this can be reproduced with: `small = []; large = []` then `small.extend(range(6)); large.extend(range(7)); print(sys.getsizeof(small), sys.getsizeof(large))` – juanpa.arrivillaga Apr 05 '22 at 20:30
  • 1
    Source code of [list.extend](https://github.com/python/cpython/blob/a0ea7a116ce52a178c02d42b684089758bd7f355/Objects/listobject.c#L864) and the [list resizing algorithm](https://github.com/python/cpython/blob/a0ea7a116ce52a178c02d42b684089758bd7f355/Objects/listobject.c#L44) if anyone wants to take a stab at figuring out what's going on – juanpa.arrivillaga Apr 05 '22 at 20:32
  • Could this mean that resizing behaves differently based on available memory? – Tushar Vazirani Apr 05 '22 at 20:37
  • 2
    @TusharVazirani no, that isn't it. The list resizing algorithm uses heuristics, and at these small sizes, resizing a 0 sized list to 6 may result in another multiple-of-4 bin than resizing to 7. See: https://stackoverflow.com/questions/68662338/how-to-understand-the-memories-of-list-structure-take-in-python – juanpa.arrivillaga Apr 05 '22 at 20:39
  • @juanpa.arrivillaga This is as interesting as expected. – Tushar Vazirani Apr 05 '22 at 20:44
  • @juanpa.arrivillaga that's why I suggested looking at the byte code, because none of that is obvious from looking at the Python code. But you seem to have proven that you need to dig even deeper. – Mark Ransom Apr 05 '22 at 20:44
  • This seems to be the most recent change to the algorithm, from 2 years ago: https://github.com/python/cpython/pull/18952 – juanpa.arrivillaga Apr 05 '22 at 20:54
  • This seems to be related: [How did print(*a, a.pop(0)) change?](/q/70404485/4518341) – wjandrea Apr 06 '22 at 00:38
  • 1
    @ksohan This behaviour seems to have changed in CPython 3.9. I can't reproduce it in CPython 3.8. – wjandrea Apr 06 '22 at 00:44
  • 1
    @wjandrea true, that seems to explain why the literal syntax is now using `LIST_EXTEND` – juanpa.arrivillaga Apr 06 '22 at 02:06

1 Answers1

10

Short story:

It's about overallocation and avoiding useless overallocation. Your cases have 6 and 7 elements. In both cases, Python first calculates 12 as the number of spots to allocate. The purpose of overallocation is to allow fast future extensions with more elements, so Python tries to guess what will happen in the future and acts accordingly.

For the 6 elements case it thinks "Hmm, in case we shall add another 6 elements, then it would indeed be good to already have 12 spots, so let's do that now."

For the 7 elements case it thinks "Hmm, in case we shall add another 7 elements, then 12 spots wouldn't be enough anyway (for then 14 elements), so we'd have to re-overallocate anyway, so let's just not overallocate now. Maybe there won't even be another extension."

So for 6 elements it allocates 12 spots and for 7 it allocates 8 spots (minimal overallocation to a multiple of 4). That's 4 spots difference. A spot holds a pointer to an object, which with 64-bit Python takes 8 bytes. So 7 elements need 4*8 = 32 fewer bytes than 6 elements, which is what you observed (120 bytes vs 152 bytes).

Long story:

I can reproduce it in CPython 3.10.0. Here's what happens:

>>> import dis
>>> dis.dis('[0,1,2,3,4,5,6]')
  1           0 BUILD_LIST               0
              2 LOAD_CONST               0 ((0, 1, 2, 3, 4, 5, 6))
              4 LIST_EXTEND              1
              6 RETURN_VALUE

An empty list is built and then extended by that tuple. It first resizes to make room for the elements. Which does this to compute how many spots to allocate:

    new_allocated = ((size_t)newsize + (newsize >> 3) + 6) & ~(size_t)3;
    /* Do not overallocate if the new size is closer to overallocated size
     * than to the old size.
     */
    if (newsize - Py_SIZE(self) > (Py_ssize_t)(new_allocated - newsize))
        new_allocated = ((size_t)newsize + 3) & ~(size_t)3;

Let's test that in Python:

>>> for newsize in range(10):
...     new_allocated = (newsize + (newsize >> 3) + 6) & ~3
...     if newsize - oldsize > new_allocated - newsize:
...         new_allocated = (newsize + 3) & ~3
...     s = f'[{",".join(map(str, range(newsize)))}]'
...     calculated_size = sys.getsizeof([]) + 8 * new_allocated
...     actual_size = sys.getsizeof(eval(s))
...     print(f'{s:20}  {calculated_size=}  {actual_size=}')
... 
[]                    calculated_size=88  actual_size=56
[0]                   calculated_size=88  actual_size=64
[0,1]                 calculated_size=120  actual_size=72
[0,1,2]               calculated_size=120  actual_size=120
[0,1,2,3]             calculated_size=120  actual_size=120
[0,1,2,3,4]           calculated_size=120  actual_size=120
[0,1,2,3,4,5]         calculated_size=152  actual_size=152
[0,1,2,3,4,5,6]       calculated_size=120  actual_size=120
[0,1,2,3,4,5,6,7]     calculated_size=120  actual_size=120
[0,1,2,3,4,5,6,7,8]   calculated_size=152  actual_size=152

Our calculated size matches the actual size. Except for fewer than three elements, but that's because they're not created by extending like that (I'll show that at the end), so it's not surprising our formula doesn't apply there.

Let's look at the code again:

    new_allocated = (newsize + (newsize >> 3) + 6) & ~3
    if newsize - oldsize > new_allocated - newsize:
        new_allocated = (newsize + 3) & ~3

And the values for your cases:

              [0,1,2,3,4,5]    [0,1,2,3,4,5,6]

oldsize             0                 0
newsize             6                 7
new_allocated      12                12
    corrected      12                 8

The reasoning from the code again:

    /* Do not overallocate if the new size is closer to overallocated size
     * than to the old size.

The newsize 7 is closer to 12 than to 0, so it decides not to overallocate (well, it does overallocate to the closest multiple of 4, for memory alignment and because that appears to work well).

The reasoning behind that, as stated by Serhiy Storchaka in the proposal:

  1. It is common enough case if the list is created from a sequence of a known size and do not adds items anymore. Or if it is created by concatenating of few sequences. In such cases the list can overallocate a space which will never be used. [...] My idea, that if we adds several items at a time and need to reallocate an array, we check if the overallocated size is enough for adding the same amount of items next time. If it is not enough, we do not overallocate. [...] It will save a space if extend a list by many items few times.

So the idea is to consider future growths of the same size, and if the next growth would require a new overallocation anyway, the current overallocation wouldn't help, so let's not do it.

About the sizes up to two elements: That doesn't use a tuple and LIST_EXTEND but instead puts the individual values onto the stack and builds the list directly in BUILD_LIST (note its argument 0, 1 or 2):

dis.dis('[]')
  1           0 BUILD_LIST               0
              2 RETURN_VALUE
dis.dis('[1969]')
                    
  1           0 LOAD_CONST               0 (1969)
              2 BUILD_LIST               1
              4 RETURN_VALUE
dis.dis('[1969, 1956]')
                    
  1           0 LOAD_CONST               0 (1969)
              2 LOAD_CONST               1 (1956)
              4 BUILD_LIST               2
              6 RETURN_VALUE

The code to execute BUILD_LIST builds a new list object with the exact number of spots needed (the number oparg: 0, 1 or 2), no overallocation. And then right there it just uses a quick little loop to pop the values off the stack and put them into the list:

        case TARGET(BUILD_LIST): {
            PyObject *list =  PyList_New(oparg);
            if (list == NULL)
                goto error;
            while (--oparg >= 0) {
                PyObject *item = POP();
                PyList_SET_ITEM(list, oparg, item);
            }
            PUSH(list);
            DISPATCH();
        }
Kelly Bundy
  • 23,480
  • 7
  • 29
  • 65
  • So basically the heuristic is unstable and messes up for a length of 6, leading to an "odd one out" situation where the list's capacity is way overkill. – Masklinn Apr 06 '22 at 13:10
  • 3
    @Masklinn Well, not sure if "messes up" and "overkill" is true, that depends on whether Python's guess/prediction of another 6 elements to add turns out to be true. Added more details now, particularly the "short story" at the beginning. – Kelly Bundy Apr 06 '22 at 13:59
  • So the whole thing is based on some non-obvious non-linearity in the overallocation strategy. Fascinating. I wonder if the strategy was data-based or if somebody just used their intuition? – Mark Ransom Apr 06 '22 at 17:41
  • @MarkRansom At Serhiy's proposal/issue, Raymond Hettinger wrote: *"We should definitely revisit the over-allocation strategy. I last worked on the existing strategy back in 2004. Since then, the weighting of the speed/space trade-off considerations have changed."* Sounds to me, and from other things that's also my understanding of how Raymond usually works, that the old formula was data-based. The new formula is similar to the old, and the changes seem based on just logic (part of it shown in my answer, the rest in the proposal/issue). Especially the change to not use odd-size allocations. – Kelly Bundy Apr 06 '22 at 17:58