2

Short version of my question: Why is it common practice in Python to initialize an empty list even when the size of the list might change many times such as the following:

arr = []
for i in range(10):
   arr.append(i)

Isn't it computationally expensive to change the size of an array iteratively?

Long version of my question: I'm used to using MATLAB and am relatively new to Python. When I want to create an array, it is common practice in MATLAB to initialize an array of zeros of appropriate size, and then replace the elements of the array with the elements you want to end up with. This is because changing the size of an array iteratively in MATLAB is computationally expensive. Is there something about Python that avoids this expense? When I see people answer Python questions on this website that involve preinitializing a list to be added to, they always create an empty list and subsequently change the size, something I have always regarded as inefficient.

Casey
  • 27
  • 1
  • it is not array, it is list. list works different than array. – furas Dec 13 '17 at 03:57
  • For avoiding the expense, `list`s are designed to grow by larger steps as they increase in size (so the `realloc`s reduce as the number of elements increases), and they only store pointers to their contents, so when `realloc` has to move the allocation, it's effectively just a `memcpy` of `8 * numelements` bytes; it costs a little, but the overhead of the Python interpreter generally masks small costs like that. – ShadowRanger Dec 13 '17 at 04:09
  • 1
    @furas: This is [not quite correct](https://docs.python.org/2/faq/design.html#how-are-lists-implemented) (or at least correct only in Python context, due to how it twisted the usual terminology). For some unknown reason, Python co-opted the term "list" to what every other language calls "array" or "vector". Then numpy took the unused term "array" and stole it for itself. – Amadan Dec 13 '17 at 04:37

2 Answers2

4

Python does not grow the list one by one. It always allocates large chunks. The chunk size is dependent on the size of the list. So the pre-allocated space gets bigger when the list get bigger.

For example, if you do 10 million appends, it does about 100 allocation, i.e. steps growing the list.

If you start with an empty list and grow it by appending one element at a time you get this steps the actually allocate new memory:

 list size:  allocations
        10:   3
       100:  10
      1000:  27
     10000:  46
    100000:  65
   1000000:  85
  10000000: 104

MATLAB array are more comparable to NumPy arrays. These array are fixed in size an growing them step-by-step is very expensive.

Brad Solomon
  • 38,521
  • 31
  • 149
  • 235
Mike Müller
  • 82,630
  • 20
  • 166
  • 161
1

The main point of creating a list and populating it later would be because you don't know how many elements are going to make their way into it. Since lists dynamically resize themselves, and arrays inherently do not, as they are typically reserved in contiguous blocks of memory and lists don't have to be, it's convenient to author your code in such a way that takes advantage of this fact.

However, the above isn't a good use of creating a list. You would want to use a list comprehension instead to generate this:

arr = [i for i in range(10)]

You would want to fill the list if you know how many elements you have in it. You can't fill the list if you don't know.

Makoto
  • 104,088
  • 27
  • 192
  • 230