1

I have a problem where I use a lot of one dimensional arrays and time complexity is really important. My fist thought was that Numpy would be the best option in every case but after testing to append and assign to a Numpy array compared to the built in list function with the following code:

import time
import numpy as np

list_size = 10000
iterations = 1000

start_time = time.time()
for i in range(iterations):
    numpy_arr_append = np.array([])
    for i in range(list_size):
        numpy_arr_append = np.append(numpy_arr_append, i)
avg_np_append = (time.time()-start_time)/iterations
print('Average time for numpy array append:', avg_np_append)

start_time = time.time()
for i in range(iterations):
    arr_append = []
    for i in range(list_size):
        arr_append.append(i)
avg_list_append = (time.time()-start_time)/iterations
print('Average time for regular list append:', avg_list_append)
print('Regular list is:', avg_np_append/avg_list_append, 'times faster.')

print('---------------------------------------------------')

start_time = time.time()
for i in range(iterations):
    numpy_arr_assign = np.zeros(list_size)
    for i in range(list_size):
        numpy_arr_assign[i] = i
avg_np_assign = (time.time()-start_time)/iterations
print('Average time for numpy array assign:', avg_np_assign)

start_time = time.time()
for i in range(iterations):
    list_assign = [0] * list_size
    for i in range(list_size):
        list_assign[i] = i
avg_list_assign = (time.time()-start_time)/iterations
print('Average time for regular list assign:', avg_list_assign)
print('Regular list is:', avg_np_assign/avg_list_assign, 'times faster.')

I get the following result:

Average time for numpy array append: 0.06636584520339966
Average time for regular list append: 0.0010602495670318604
Regular list is: 62.59455062942306 times faster.
---------------------------------------------------
Average time for numpy array assign: 0.0017054696083068847
Average time for regular list assign: 0.0009176504611968995
Regular list is: 1.8585176823018494 times faster.

So my question is, what is the best practice when using large arrays and adding elements multiple times. From my testing the built-in function was always better and there was hardly any difference between appending to it or initializing it and then assigning values. Is there something I am missing or is there another better/faster option?

Best regards, Jakob

JakobVinkas
  • 1,003
  • 7
  • 23
  • 1
    youre using numpy arrays wrong. if all you need to do is append many times, or only update one index at a time, numpy would definitely be slower. – Paritosh Singh Feb 13 '20 at 07:00
  • Does this answer your question? [Fastest way to grow a numpy numeric array](https://stackoverflow.com/questions/7133885/fastest-way-to-grow-a-numpy-numeric-array) – Zaraki Kenpachi Feb 13 '20 at 07:05
  • Also, time complexity != actual time, it's a measure of how something grows as data size increases. append specifically is O(n), so it's pretty bad for numpy. indexing is O(1) – Paritosh Singh Feb 13 '20 at 07:06
  • @ParitoshSingh I am aware of the difference between complexity and time consumption but that is not really my question. I want to know the best way of performing the "generation" of an array when values are added/assigned one by one. With your first comment, do you mean that the best way is to use the regular list? – JakobVinkas Feb 13 '20 at 07:14
  • @ZarakiKenpachi Yes I think it does, thank you very much. – JakobVinkas Feb 13 '20 at 07:16
  • `np.append` is a poorly named cover for `np.concatenate`. Give `concatenate` (and its `stack` covers) a whole list of arrays. – hpaulj Feb 13 '20 at 07:21
  • @JakobVinkas correct, my first comment hints at it, but the 2nd makes it clear. append in numpy is an O(n) operation (you have to find a new memory block that can fit the old array + 1 item, and reassign every single item into this new memory block. Easier to think of arrays as immutable fixed blocks in memory). Appending items one by one is horribly inefficient for numpy. For list, append is O(1). So, if you *must* do it, use a list while appending, convert to numpy only when all items are collected... – Paritosh Singh Feb 13 '20 at 09:07
  • But my main point, perhaps more importantly, is that perhaps there *might* be a better way to do whatever you're doing, depending on what you're actually doing, compared to appending items one by one. Numpy shines when you use it right, and that requires moving away from "iteration" style programming, and thinking of vectorization. – Paritosh Singh Feb 13 '20 at 09:07

0 Answers0