19

Suppose you want to write a function which yields a list of objects, and you know in advance the length n of such list.

In python the list supports indexed access in O(1), so it is arguably a good idea to pre-allocate the list and access it with indexes instead of allocating an empty list and using the append() method. This is because we avoid the burden of expanding the whole list if the space is not enough.

If I'm using python, probably performances are not that relevant in any case, but what is the better way of pre-allocating a list?

I know these possible candidates:

  • [None] * n → allocating two lists
  • [None for x in range(n)] — or xrange in python2 → building another object

Is one significantly better than the other?

What if we are in the case n = len(input)? Since input exists already, would [None for x in input] have better performances w.r.t. [None] * len(input)?

Ashwini Chaudhary
  • 244,495
  • 58
  • 464
  • 504
Dacav
  • 13,590
  • 11
  • 60
  • 87

3 Answers3

22

When you append an item to a list, Python 'over-allocates', see the source-code of the list object. This means that for example when adding 1 item to a list of 8 items, it actually makes room for 8 new items, and uses only the first one of those. The next 7 appends are then 'for free'.

In many languages (e.g. old versions of Matlab, the newer JIT might be better) you are always told that you need to pre-allocate your vectors, since appending during a loop is very expensive. In the worst case, appending of a single item to a list of length n can cost O(n) time, since you might have to create a bigger list and copy all the existing items over. If you need to do this on every iteration, the overall cost of adding n items is O(n^2), ouch. Python's pre-allocation scheme spreads the cost of growing the array over many single appends (see amortized costs), effectively making the cost of a single append O(1) and the overall cost of adding n items O(n).

Additionally, the overhead of the rest of your Python code is usually so large, that the tiny speedup that can be obtained by pre-allocating is insignificant. So in most cases, simply forget about pre-allocating, unless your profiler tells you that appending to a list is a bottleneck.

The other answers show some profiling of the list preallocation itself, but this is useless. The only thing that matters is profiling your complete code, with all your calculations inside your loop, with and without pre-allocation. If my prediction is right, the difference is so small that the computation time you win is dwarfed by the time spent thinking about, writing and maintaining the extra lines to pre-allocate your list.

Bas Swinckels
  • 18,095
  • 3
  • 45
  • 62
  • The source code mentions the growth pattern `0, 4, 8, 16, 25, 35, 46, 58, 72, 88, ...`, so it is probably quadratic (or whatever you call that), not exponential. – Bas Swinckels Mar 06 '14 at 13:38
  • my misunderstanding then... don't remember where i read that. i'll delete the comment. – Corley Brigman Mar 06 '14 at 13:39
  • 2
    It is exponential, just a very slow one. The predominant growth rate is (1.125)^k, but growth tends be dominated by the fixed contribution (+3/+6) over the sequence shown in the comments. – marshall.ward Aug 20 '15 at 01:34
21

In between those two options the first one is clearly better as no Python for loop is involved.

>>> %timeit [None] * 100
1000000 loops, best of 3: 469 ns per loop
>>> %timeit [None for x in range(100)] 
100000 loops, best of 3: 4.8 us per loop

Update:

And list.append has an O(1) complexity too, it might be a better choice than pre-creating list if you assign the list.append method to a variable.

>>> n = 10**3
>>> %%timeit
lis = [None]*n           
for _ in range(n):
    lis[_] = _
... 
10000 loops, best of 3: 73.2 us per loop
>>> %%timeit
lis = []                 
for _ in range(n):
    lis.append(_)
... 
10000 loops, best of 3: 92.2 us per loop
>>> %%timeit
lis = [];app = lis.append
for _ in range(n):
    app(_)
... 
10000 loops, best of 3: 59.4 us per loop

>>> n = 10**6
>>> %%timeit
lis = [None]*n
for _ in range(n):
    lis[_] = _
... 
10 loops, best of 3: 106 ms per loop
>>> %%timeit
lis = []      
for _ in range(n):
    lis.append(_)
... 
10 loops, best of 3: 122 ms per loop
>>> %%timeit
lis = [];app = lis.append
for _ in range(n):
    app(_)
... 
10 loops, best of 3: 91.8 ms per loop
Ashwini Chaudhary
  • 244,495
  • 58
  • 464
  • 504
2

Obviously, the first version. Let me explain why.

  1. When you do [None] * n, Python internally creates a list object of size n and it copies the the same object (here None) (this is the reason, you should use this method only when you are dealing with immutable objects) to all the memory locations. So memory allocation is done only once. After that a single iteration through the list to copy the object to all the elements. list_repeat is the function which corresponds to this type of list creation.

    # Creates the list of specified size
    np = (PyListObject *) PyList_New(size);
    ....
    ...
    items = np->ob_item;
    if (Py_SIZE(a) == 1) {
        elem = a->ob_item[0];
        for (i = 0; i < n; i++) {
            items[i] = elem;       // Copies the same item
            Py_INCREF(elem);
        }
        return (PyObject *) np;
    }
    
  2. When you use a list comprehension to build a list, Python cannot know the actual size of the list being created, so it initially allocates a chunk of memory and a fresh copy of the object is stored in the list. When the list grows beyond the allocated length, it has to allocate the memory again and continue with the new object creation and storing that in the list.

thefourtheye
  • 233,700
  • 52
  • 457
  • 497