How can I be more efficient with my numpy array usage?

Question

For context I was solving the day 6 problem from the 2021 Advent of Code, and wanted to try using numpy arrays over python lists since from my current understanding they should be faster. But ran into an issue, my solution prints the correct answer but takes ages to finish computing as the number_of_days_to_cycle_through variable scales.

I wanted help in understanding why the incredibly long scaling was occurring, and how to notice/prevent that mistake in my code going forward?(the lantern_fish_array is a numpy array of int64)

def iterate_through_one_day(lantern_fish_array):
     iterator = 0

     copy_of_lantern_fish_array = lantern_fish_array.copy()
     for fish in copy_of_lantern_fish_array:
         if fish == 0:
             lantern_fish_array = np.append(lantern_fish_array, 8)
             lantern_fish_array[iterator] = 6
         else:
             lantern_fish_array[iterator] -= 1
         iterator += 1
     del copy_of_lantern_fish_array

     return new_lantern_fish_array


def solve_part_1(lantern_fish_array):
     num_of_days_to_cycle_through = 256
     while num_of_days_to_cycle_through != 0:
         lantern_fish_array = iterate_through_one_day(lantern_fish_array)
         num_of_days_to_cycle_through -= 1
     return lantern_fish_array.size

Numpy arrays are fast if you use their built-in vectorized operations. If you just iterate over them and use them like lists, they are usually *much slower* than using a list — juanpa.arrivillaga, Jul 14 '22 at 22:01
**Also**, `lantern_fish_array = np.append(lantern_fish_array, 8)` is an anti-pattern in `numpy`. Numpy arrays are fixed-size arrays, they do not have efficient amoritized constant time `.append` like `list` objects, so this code will be polynomial time instead of linear time, as you might expect with Python lists — juanpa.arrivillaga, Jul 14 '22 at 22:03
Example of the input would look like lantern_fish_array = [1,2,3,4,5] all int64 type — SnorLaxatives, Jul 14 '22 at 23:29

Tom M · Accepted Answer · 2022-07-14T22:10:43.070

0

Numpy arrays are fast due to operations being done in parallel (vectorized). For using looping they might well be slower than a list. So unless your operation can be expressed in a parallel manner (vectorized) you probably won't have any benefit when using a looping/iterative operation on a np array versus the same operation on a list.

See these resources on broadcasting and vectorization,

https://realpython.com/numpy-array-programming/

https://unidata.github.io/python-training/workshop/NumPy/numpy-broadcasting-and-vectorization/

Also some suggestions for when you can't avoid doing looping,

Fast Numpy Loops

edited Jul 14 '22 at 22:10

answered Jul 14 '22 at 21:57

Tom M

1,292
13
18

The word you are looking for is *vectorized*, not parallel. – juanpa.arrivillaga Jul 14 '22 at 22:01
1

yep had already planned some edits. Though vectorization 'is the process by which a conventional procedure becomes parallel'. So it is the same thing. – Tom M Jul 14 '22 at 22:02
1

@TomM No there are not the same thing. IDK were did you get this definition. "[Vectorization](https://numpy.org/doc/stable/glossary.html#term-vectorization) can refer both to the C offloading and to structuring NumPy code to leverage it" in a Numpy/Python context. Most Numpy functions are not multi-threaded and until recently many was not using SIMD instructions either. They was purely sequential (ie. the code had a sequential control flow though modern processors are superscalar). A good example is `np.unique` or `np.cumsum` for example which are not parallel at all but vectorized operations. – Jérôme Richard Jul 14 '22 at 22:58
@JérômeRichard https://indico.cern.ch/event/814979/contributions/3401203/attachments/1831468/3115808/VectorParallelismMultiCoreProc.pdf Or https://stackoverflow.com/questions/14259737/what-is-the-relationship-between-vectorization-and-embarrasingly-parallel vectorization is a subset of parallelism known as 'vector parallelism'. That is what the numpy operation got it's name from. A function must be parallelizable for vectorize to benefit it. Any function that can act on vectors vs scalars is inherently capable of vector parallelism. – Tom M Jul 16 '22 at 01:55
@TomM These sources does not really apply in the context of Numpy/Python. In fact, there are multiple definition for "vectorization" regarding the context. The provided one is inherited from old defunct vector machines. It can applies in LL languages like C or even for CPU architectures (though SIMD/SIMT should often be preferred). Numpy inherited its definition from languages like Matlab since they worked on vectors/matrices and scalar operations was (initially) slow. See: https://wikipedia.org/wiki/Array_programming (and https://wikipedia.org/wiki/Automatic_vectorization for the other) – Jérôme Richard Jul 16 '22 at 03:02
@TomM It is all about choosing a definition, but the one of Numpy is explicitly defined in the doc and they refer to the former (vector/array programming). It make sense as not all operations are actually vectorized in C (based on the later definition): not all use SIMD instructions (though we work on it). "*A function must be parallelizable for vectorize to benefit it*" Well SIMD/SIMT implies parallelism, but parallelism does not implies SIMD/SIMT. This is why it is not the "same thing" whatever the former/later definition is chosen (ie. there is no equivalence). – Jérôme Richard Jul 16 '22 at 03:15
"*Any function that can act on vectors vs scalars is inherently capable of vector parallelism*" Theoretically yes, but in practice this is not the case because the instruction set of mainstream CPUs/GPUs are too limited (even the one of GPU actually). For example, cumsum can be implemented using a scan pattern but no x86/ARM/POWER CPU support fast instructions for that yet (most GPU do). For histograms, gather/scatter/CD are missing or inefficient on CPUs (still not great on GPUs). For sorting, it is also a mess, even on GPUs (AVX-512 on Intel CPUs help a bit thanks to compress instructions) – Jérôme Richard Jul 16 '22 at 03:46

score 0 · Answer 2 · answered Jul 15 '22 at 01:12

You can do something like:

import numpy as np

# numpy style
def iterate_through_one_day_np(lantern_fish_array):
    new_array=np.copy(lantern_fish_array)
    mask_0 = new_array==0
    
    new_array[mask_0] = 6 
    new_array[~mask_0] -= 1  # or new_array[~mask_0] = new_array[~mask_0] - 1

    new_array = np.append(new_array, [8]*np.sum(mask_0))
    return new_array

# your code for reference
def iterate_through_one_day(lantern_fish_array):
    iterator = 0
    new_array=np.copy(lantern_fish_array)
    for fish in lantern_fish_array:
        if fish == 0:
            new_array = np.append(new_array, 8)
            new_array[iterator] = 6
        else:
            new_array[iterator] -= 1
        iterator += 1
    
    return new_array

lantern_fish_array = [0,1,2,3,4,5]

iterate_through_one_day(lantern_fish_array)
# array([6, 0, 1, 2, 3, 8])

iterate_through_one_day(lantern_fish_array)
# array([6, 0, 1, 2, 3, 8])

Speed testing

Get a bit more fishes, i.e. 50 times the list with: [0,1,2,3,4,5]*50

%%timeit -r 3 -n 3
iterate_through_one_day_np([0,1,2,3,4,5]*50)
# 94.4 µs ± 4.89 µs per loop (mean ± std. dev. of 3 runs, 3 loops each)

vs.

%%timeit -r 3 -n 3
iterate_through_one_day([0,1,2,3,4,5]*50)
# 878 µs ± 154 µs per loop (mean ± std. dev. of 3 runs, 3 loops each)

Mechanic Pig · Answer 3 · 2022-07-15T15:59:32.347

For array with many occurrences of 0, use numpy.where is much faster than modifying the value through index twice. For array with less occurrences of 0, its performance is also slightly faster than method of LHans:

>>> def one_day_use_where(lantern_fish_array):
...     mask = lantern_fish_array == 0
...     return np.concatenate([np.where(mask, 6, lantern_fish_array - 1), np.repeat(8, mask.sum())])
...

Some test:

>>> def one_day_use_index(lantern_fish_array):
...     mask = lantern_fish_array == 0
...     new_array = lantern_fish_array.copy()
...     new_array[mask] = 6
...     new_array[~mask] -= 1
...     return np.concatenate([new_array, np.repeat(8, mask.sum())])
...
>>> a = np.random.randint(0, 10, 10000)   # 0 accounts for about 10%
>>> timeit(lambda: one_day_use_where(a), number=10000)
0.4181383000104688
>>> timeit(lambda: one_day_use_index(a), number=10000)
0.8232910000078846
>>> a = np.random.randint(0, 2, 10000)    # 0 accounts for about 50%
>>> timeit(lambda: one_day_use_where(a), number=10000)
0.544302800000878
>>> timeit(lambda: one_day_use_index(a), number=10000)
1.917074600001797
>>> a = np.random.randint(1, 3, 10000)    # Does not contain 0
>>> timeit(lambda: one_day_use_where(a), number=10000)
0.38596799998776987
>>> timeit(lambda: one_day_use_index(a), number=10000)
0.3989579000044614
>>> a = np.zeros(10000, dtype=int)
>>> # If the proportion of 0 is too high, the performance will be slightly worse
>>> timeit(lambda: one_day_use_np_where(a), number=10000)
0.6532589000125881
>>> timeit(lambda: one_day_use_index(a), number=10000)
0.5977481000008993

How can I be more efficient with my numpy array usage?

3 Answers3

Speed testing