1

I was experimenting with how Python allocates the memory, so found the same issue like Size of list in memory and Eli describes in a much better way. His answer leads me to the new doubt that, I checked the size of 1 + [] and [1], but it is different as you can see in the code snippet. if I'm not wrong memory space allocation should be the same. But it's not the case. Anyone can help me with the understanding?

>>> import sys
>>> sys.getsizeof(1)
    28
>>> sys.getsizeof([])
    64
>>> 28 + 64
    92
>>> sys.getsizeof([1])
    72
Parth Patel
  • 68
  • 1
  • 9
  • 5
    I don't really understand what you don't understand. Nowhere are you comparing the size of the same objects... But perhaps what you are missing is that `sys.getsizeof` only gives you the size of *the object itself*, not any other objects that might be referenced by the argument. So, the size of `[1]` is just the list object, which includes under the hood an array of PyObject pointers. – juanpa.arrivillaga Feb 12 '21 at 20:21
  • 3
    This is because the list holds a reference to some other object. The `sizeof` of the reference is 72-64=8. So you can put a lot of big elements into your list, but the list will still be of size 64+(8*len) (I'm skipping a step about list resizing allocation here, so this is only *mostly* accurate). Each object in turn will take up as much space as it needs (28, in your case) – inspectorG4dget Feb 12 '21 at 20:22
  • Adding another quirk, while `1` is 28 bytes large, it's a [cached integer](https://wsvincent.com/python-wat-integer-cache/) in CPython. Only the reference actually needs another allocation. That reference being 8 bytes large, though, `array` storage can be quite a bit more efficient. – Yann Vernier Feb 12 '21 at 20:42
  • 1
    @YannVernier Caching has nothing to do with it (unless you are referring to the total memory accessible from the list reference). The elements of a list are *always* references to other objects. – chepner Feb 12 '21 at 20:44
  • It only indirectly relates to the numbers listed, but it's relevant in considering how much memory is used and what dynamic allocations occur. Cached integers are preallocated, small integers remain 28 bytes (on that 64-bit system), and long integers can be even larger. Since the asker was expecting the integer to reside *in* the list, it could be relevant to know where it actually is stored. Side note: although lists only hold references, arrays do hold values. This carries another cost, as extracting them has to convert into a Python object. – Yann Vernier Feb 12 '21 at 20:49

1 Answers1

0

What's the minimum information a list needs to function?

  1. some kind of top-level list object, containg a reference to the class information (methods, type info, etc), and the list's own instance data
  2. the actual objects stored in the list

... that gets you the size you expected. But is it enough?

A fixed-size list object can only track a fixed number of list entries: traditionally just one (head) or two (head and tail). Adding more entries to the list doesn't change the size of the list object itself, so there must be some extra information: the relationship between list elements.

It's possible to store this information in every Object (this is called an intrusive list), but it's very restrictive: each Object can only be stored in one list at a time.

Since Python lists clearly don't behave like that, we know this extra information isn't already in the list element, and it can't be inside the list object, so it must be stored elsewhere. Which increases the total size of the list.

NB. I've kept this argument fairly abstract deliberately. You could implement list a few different ways, but none of them avoid some extra storage for element relationships, even if the representation differs.

Useless
  • 64,155
  • 6
  • 88
  • 132
  • 1
    Most of this answer describes a linked list, not a Python list (which is typically implemented as a dynamically sized reference array). – Yann Vernier Feb 12 '21 at 20:53
  • In which case the extra allocation is the reference array itself and any slack allocation in it. The point is that more information than OP anticipated needs to be stored, irrespective of the implementation details. Which is why I very carefully said exactly that. Perhaps I should move the last paragraph to the top? – Useless Feb 13 '21 at 15:27