82

So i was playing with list objects and found little strange thing that if list is created with list() it uses more memory, than list comprehension? I'm using Python 3.5.2

In [1]: import sys
In [2]: a = list(range(100))
In [3]: sys.getsizeof(a)
Out[3]: 1008
In [4]: b = [i for i in range(100)]
In [5]: sys.getsizeof(b)
Out[5]: 912
In [6]: type(a) == type(b)
Out[6]: True
In [7]: a == b
Out[7]: True
In [8]: sys.getsizeof(list(b))
Out[8]: 1008

From the docs:

Lists may be constructed in several ways:

  • Using a pair of square brackets to denote the empty list: []
  • Using square brackets, separating items with commas: [a], [a, b, c]
  • Using a list comprehension: [x for x in iterable]
  • Using the type constructor: list() or list(iterable)

But it seems that using list() it uses more memory.

And as much list is bigger, the gap increases.

Difference in memory

Why this happens?

UPDATE #1

Test with Python 3.6.0b2:

Python 3.6.0b2 (default, Oct 11 2016, 11:52:53) 
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> sys.getsizeof(list(range(100)))
1008
>>> sys.getsizeof([i for i in range(100)])
912

UPDATE #2

Test with Python 2.7.12:

Python 2.7.12 (default, Jul  1 2016, 15:12:24) 
[GCC 5.4.0 20160609] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> sys.getsizeof(list(xrange(100)))
1016
>>> sys.getsizeof([i for i in xrange(100)])
920
smci
  • 32,567
  • 20
  • 113
  • 146
vishes_shell
  • 22,409
  • 6
  • 71
  • 81
  • 3
    That's a very interesting question. I can reproduce the phenomenon in Python 3.4.3. Even more interesting: on Python 2.7.5 `sys.getsizeof(list(range(100)))` is 1016, `getsizeof(range(100))` is 872 and `getsizeof([i for i in range(100)])` is 920. All have the type `list`. – Sven Rusch Oct 13 '16 at 10:33
  • Of interest is that this difference is there in Python 2.7.10 as well (although the actual numbers are different from Python 3). Also there in 3.5 and 3.6b. – cdarke Oct 13 '16 at 10:33
  • I get the same numbers for Python 2.7.6 as @SvenFestersen, also when using `xrange`. – RemcoGerlich Oct 13 '16 at 10:34
  • 2
    There is a possible explanation here: http://stackoverflow.com/questions/7247298/size-of-list-in-memory. If one of the methods creates the list using `append()`, there might be an over-allocation of memory. I guess the only way to really clarify this is to have a look at the Python sources. – Sven Rusch Oct 13 '16 at 10:39
  • Only 10% more (you don't really say that anywhere). I'd rephrase the title to "slightly more". – smci Sep 28 '19 at 05:32

2 Answers2

65

I think you're seeing over-allocation patterns this is a sample from the source:

/* This over-allocates proportional to the list size, making room
 * for additional growth.  The over-allocation is mild, but is
 * enough to give linear-time amortized behavior over a long
 * sequence of appends() in the presence of a poorly-performing
 * system realloc().
 * The growth pattern is:  0, 4, 8, 16, 25, 35, 46, 58, 72, 88, ...
 */

new_allocated = (newsize >> 3) + (newsize < 9 ? 3 : 6);

Printing the sizes of list comprehensions of lengths 0-88 you can see the pattern matches:

# create comprehensions for sizes 0-88
comprehensions = [sys.getsizeof([1 for _ in range(l)]) for l in range(90)]

# only take those that resulted in growth compared to previous length
steps = zip(comprehensions, comprehensions[1:])
growths = [x for x in list(enumerate(steps)) if x[1][0] != x[1][1]]

# print the results:
for growth in growths:
    print(growth)

Results (format is (list length, (old total size, new total size))):

(0, (64, 96)) 
(4, (96, 128))
(8, (128, 192))
(16, (192, 264))
(25, (264, 344))
(35, (344, 432))
(46, (432, 528))
(58, (528, 640))
(72, (640, 768))
(88, (768, 912))

The over-allocation is done for performance reasons allowing lists to grow without allocating more memory with every growth (better amortized performance).

A probable reason for the difference with using list comprehension, is that list comprehension can not deterministically calculate the size of the generated list, but list() can. This means comprehensions will continuously grow the list as it fills it using over-allocation until finally filling it.

It is possible that is will not grow the over-allocation buffer with unused allocated nodes once its done (in fact, in most cases it wont, that would defeat the over-allocation purpose).

list(), however, can add some buffer no matter the list size since it knows the final list size in advance.


Another backing evidence, also from the source, is that we see list comprehensions invoking LIST_APPEND, which indicates usage of list.resize, which in turn indicates consuming the pre-allocation buffer without knowing how much of it will be filled. This is consistent with the behavior you're seeing.


To conclude, list() will pre-allocate more nodes as a function of the list size

>>> sys.getsizeof(list([1,2,3]))
60
>>> sys.getsizeof(list([1,2,3,4]))
64

List comprehension does not know the list size so it uses append operations as it grows, depleting the pre-allocation buffer:

# one item before filling pre-allocation buffer completely
>>> sys.getsizeof([i for i in [1,2,3]]) 
52
# fills pre-allocation buffer completely
# note that size did not change, we still have buffered unused nodes
>>> sys.getsizeof([i for i in [1,2,3,4]]) 
52
# grows pre-allocation buffer
>>> sys.getsizeof([i for i in [1,2,3,4,5]])
68
Moses Koledoye
  • 77,341
  • 8
  • 133
  • 139
Reut Sharabani
  • 30,449
  • 6
  • 70
  • 88
  • 4
    But why would the over-allocaton happen with one but not the other? – cdarke Oct 13 '16 at 10:41
  • This specifically is from `list.resize`. I'm not an expert in navigating throught he source, but if one calls resize and the other doesn't - it could explain the difference. – Reut Sharabani Oct 13 '16 at 10:42
  • 6
    Python 3.5.2 here. Try printing sizes of lists from 0 to 35 in loop. For list I see `64, 96, 104, 112, 120, 128, 136, 144, 160, 192, 200, 208, 216, 224, 232, 240, 256, 264, 272, 280, 288, 296, 304, 312, 328, 336, 344, 352, 360, 368, 376, 384, 400, 408, 416` and for comprehension `64, 96, 96, 96, 96, 128, 128, 128, 128, 192, 192, 192, 192, 192, 192, 192, 192, 264, 264, 264, 264, 264, 264, 264, 264, 264, 344, 344, 344, 344, 344, 344, 344, 344, 344`. I would except that comprehension being the one who seem to preallocate memory to be the algorithm that uses more RAM for certain sizes. – tavo Oct 13 '16 at 11:02
  • I'd expect the same. I can look further into it soon. Good comments. – Reut Sharabani Oct 13 '16 at 11:05
  • 4
    actually `list()` deterministically determines list size, which list comprehension can't do. This suggests list comprehension doesn't always "trigger" the "last" growth of the list. Could make sense. – Reut Sharabani Oct 13 '16 at 11:08
  • I think that since range does support `len` list allocates `len` + some preallocation amount of spaces. Interestingly enought, when I use generator object as argument for list (generators does not support `len`) I get different size pattern than with list comprehension: `64, 96, 104, 112, 120, 128, 160, 160, 160, 160, 160, 160, 160, 224, 224, 224, 224, 224, 224, 224, 224, 296, 296, 296, 296, 296, 296, 296, 296, 296, 376, 376, 376, 376, 376` – tavo Oct 13 '16 at 11:33
30

Thanks everyone for helping me to understand that awesome Python.

I don't want to make question that massive(that why i'm posting answer), just want to show and share my thoughts.

As @ReutSharabani noted correctly: "list() deterministically determines list size". You can see it from that graph.

graph of sizes

When you append or using list comprehension you always have some sort of boundaries that extends when you reach some point. And with list() you have almost the same boundaries, but they are floating.

UPDATE

So thanks to @ReutSharabani, @tavo, @SvenFestersen

To sum up: list() preallocates memory depend on list size, list comprehension cannot do that(it requests more memory when it needed, like .append()). That's why list() store more memory.

One more graph, that show list() preallocate memory. So green line shows list(range(830)) appending element by element and for a while memory not changing.

list() preallocates memory

UPDATE 2

As @Barmar noted in comments below, list() must me faster than list comprehension, so i ran timeit() with number=1000 for length of list from 4**0 to 4**10 and the results are

time measurements

Community
  • 1
  • 1
vishes_shell
  • 22,409
  • 6
  • 71
  • 81
  • 1
    The answer why red line is above blue is, that when `list` constructor can determine size of new list from it's argument it will still preallocate the same amount of space as it would if the last element just got there and there was not enough space for it.At least that's what makes sense to me. – tavo Oct 13 '16 at 11:43
  • @tavo it's seems the same to me, after some moment i want to show it in the graph. – vishes_shell Oct 13 '16 at 11:46
  • 2
    So while list comprehensions use less memory, they're probably significantly slower because of all the resizes that occur. These will often have to copy the list backbone to a new memory area. – Barmar Oct 18 '16 at 18:46
  • @Barmar actually i can run some time measurements with `range` object(that could be fun). – vishes_shell Oct 18 '16 at 18:50
  • And it will make your graphs even prettier. :) – Barmar Oct 18 '16 at 19:00
  • I was hoping you'd overlay them on the same graph. However, what this shows is that you don't really notice much performance impact until the list gets very large. – Barmar Oct 18 '16 at 20:12
  • @Barmar overlaying won't work because Y axis is different in measurements(time and memory) hence can't really overlay on each other. And yeah, list comprehension are fast. – vishes_shell Oct 18 '16 at 20:17