0

People coming from other coding languages to python often ask how they should pre-allocate or initialize their list. This is especially true for people coming from Matlab where codes as

l = []
for i = 1:100
    l(end+1) = 1;
end

returns a warning that explicitly suggest you to initialize the list.

There are several posts on SO explaining (and showing through tests) that list initialization isn't required in python. A good example with a fair bit of discussion is this one (but the list could be very long): Create a list with initial capacity in Python

The other day, however, while looking for operations complexity in python, I stumbled this sentence on the official python wiki:

the largest [cost for list operations] come from growing beyond the current allocation size (because everything must move),

This seems to suggest that indeed lists do have a pre-allocation size and that growing beyond that size cause the whole list to move.

This shacked a bit my foundations. Can list pre-allocation reduce the overall complexity (in terms of number of operations) of a code? If not, what does that sentence means?

EDIT:

Clearly my question regards the (very common) code:

container = ... #some iterable with 1 gazilion elements
new_list = []
for x in container:
    ... #do whatever you want with x
    new_list.append(x) #or something computed using x
    

In this case the compiler cannot know how many items there are in container, so new_list could potentially require his allocated memory to change an incredible number of times if what is said in that sentence is true.

I know that this is different for list-comprehensions

Luca
  • 1,610
  • 1
  • 19
  • 30
  • 2
    That sentence says that growing beyond the current allocation is the heaviest operation with lists. It doesn’t say that it’s too heavy, or that preallocation is advised. Benchmark when in doubt for your specific case. – deceze Feb 27 '21 at 14:34
  • @deceze I know I could benchmark, but there are already so many benchmarks on this already that mine would be a drop in the ocean. Also, in the question I posted, it is shown how timing this things can be tricky as seemingly innocuous operations as just generating the item to append can mask the thing you want to test. To be honest, I was looking more to a theoretical explanation than just a “look, it works”. But the question remains: which is the current allocation? If growing beyond it is heavy, why is pre-allocation always discouraged? – Luca Feb 27 '21 at 14:37
  • Lists in python, as they're built to be mutated, have some preallocated memory which allows operations like append, etc to be faster. This is why tuples take up less space in memory. However, preallocation has no benefit to you at all if you could just declare the list as you'd prefer it (ie. list comprehension, or `[1]*1000`) – Kraigolas Feb 27 '21 at 14:38
  • Does this answer your question? [Create a list with initial capacity in Python](https://stackoverflow.com/questions/311775/create-a-list-with-initial-capacity-in-python) – Kraigolas Feb 27 '21 at 14:40
  • @Kraigolas so, following your reasoning, the typical `for i in range(N): l.append(i)` should be discouraged in favour of list comprehension or (as you wrote) preallocation in the form of `[None]*N`. Why this is not the case? – Luca Feb 27 '21 at 14:41
  • @Kraigolas no, given that I posted the exact same link in my question – Luca Feb 27 '21 at 14:42
  • 1
    @LucaAmerio - from your comments it sounds like you are trying to optimize list creation before you know whether it is a particular problem. Locking yourself into one specific way of creating a list just because it is theoretically more efficient is going to restrict you to certain modes of working with it. – wwii Feb 27 '21 at 14:57
  • @wwii this question is not related to any specific code I've done or I'm doing. Just, after years of using a strategy (maybe because I locked myself into it as you said), I read something that is making me questioning the way I've been coding for years. I understand that any problem could require a specific solution. For example I would never be worried if I need to append 10 or 100 items to a list. But should I try to avoid it if my list can become thousands items long? Let's put it this way, this question serves exactly the opposite purpose: unlocking me from a specific way I've been doing – Luca Feb 27 '21 at 15:01
  • ...... No ...... – wwii Feb 27 '21 at 15:03
  • 1
    @wwii I praise your conciseness, but can you explain why given the sentence above? – Luca Feb 27 '21 at 15:05

1 Answers1

2

Can list pre-allocation reduce the overall complexity (in terms of number of operations) of a code?

No, the overall time complexity of the code will be the same, because the time cost of reallocating the list is O(1) when amortised over all of the operations which increase the size of the list.

If not, what does that sentence means?

In principle, pre-allocating the list could reduce the running time by some constant factor, by avoiding multiple re-allocations. This doesn't mean the complexity is lower, but it may mean the code is faster in practice. If in doubt, benchmark or profile the relevant part of your code to compare the two options; in most circumstances it won't matter, and when it does, there are likely to be better alternatives anyway (e.g. NumPy arrays) for achieving the same goal.

new_list could potentially require his allocated memory to change an incredible number of times

List reallocation follows a geometric progression, so if the final length of the list is n then the list is reallocated only O(log n) times along the way; not an "incredible number of times". The way the maths works out, the average number of times each element gets copied to a new underlying array is a constant regardless of how large the list gets, hence the O(1) amortised cost of appending to the list.

kaya3
  • 47,440
  • 4
  • 68
  • 97