Is it better to pre-allocate array in python or use arr.append()?

Question

In terms of readability and performance, should I pre-allocate memory for an array using [None]*n? Is allocating an empty one [] and using .append() over and over considered wasteful?

Useful: [Stacks / list in python - how does it append?](https://stackoverflow.com/questions/49244003/stacks-list-in-python-how-does-it-append) and [Python - Create a list with initial capacity](https://stackoverflow.com/questions/311775/python-create-a-list-with-initial-capacity) — pault, Aug 26 '20 at 14:23
You don't need to preallocate anything. Behind the scenes, the `list` type will periodically allocate more space than it needs for its immediate use to amortize the cost of resizing the underlying array across multiple updates. — chepner, Aug 26 '20 at 14:29
I agree with @chepner. In my experience it is definitely more pythonic to use `.append`. Use of `[None]*n` is definitely a code smell. If you want to be industrious with memory, you should look into using a python generator instead — user32882, Aug 26 '20 at 14:32
For readability, consider if a list comprehension is suitable. — chepner, Aug 26 '20 at 14:39

alani · Accepted Answer · 2020-08-26T14:46:38.343

5

In this simple timing test, the use of [None] * n does indeed appear to be slightly quicker, but arguably not by enough to justify adopting this approach over the more usual idioms.

import time

def func1(size):
    a = [None] * size
    for i in range(size):
        a[i] = i

def func2(size):
    a = []
    for i in range(size):
        a.append(i)

def func3(size):
    a = [i for i in range(size)]

    
size = 1000000
repeat = 100
    
t0 = time.time()

for _ in range(repeat):
    func1(size)
t1 = time.time()

for _ in range(repeat):
    func2(size)
t2 = time.time()

for _ in range(repeat):
    func2(size)
t3 = time.time()

print(t1 - t0, t2 - t1, t3 - t2)

Results:

[None * size] and then index: 4.82 seconds
append in a loop: 6.37 seconds
list comprehension: 6.34 seconds

Repeating the tests with size=1000 and repeat=100000 give similar results:

[None * size] and then index: 3.16 seconds
append in a loop: 4.88 seconds
list comprehension: 4.84 seconds

And again with size=10 and repeat = 10000000:

[None * size] and then index: 6.09 seconds
append in a loop: 7.65 seconds
list comprehension: 7.66 seconds

edited Aug 26 '20 at 14:46

answered Aug 26 '20 at 14:39

alani

12,573
2
13
23

How about with contents that are a bit more sizeable, say with multiple elements of numpy arrays? – berniethejet Aug 05 '21 at 14:43
1

@berniethejet You'll have to try it yourself (just adapt my test code) but I would expect the same result again. Bear in mind that the list object will just store _references_ to the objects that are its elements, rather than those objects themselves. – alani Aug 11 '21 at 14:29
Yes, thanks @alani, this is good to know. So basically appending to a list is not affected by the size of the list, and only pointers are ever appended. And basically this is the only way to collect a large set of results in, say, a simulation: panda dataframes and numpy arrays won't do. – berniethejet Aug 13 '21 at 15:17
Indeed list append in Python is not strictly O[1], and can be O[n]: "The .append(item) method is also O(1) until Python has to extend the array under the list. But array extension is an O(n) operation.": https://antonz.org/list-internals/. So perhaps this test here is insufficiently large. – berniethejet Nov 15 '21 at 04:23
The example is not correct, you use again append instead of list comprehension. In truth, list comprehension is the fastest for this case. – SergioGM Oct 19 '22 at 06:52

Is it better to pre-allocate array in python or use arr.append()?

1 Answers1