How to avoid building numpy array iteratively

Question

I know iterating through numpy arrays is more costly than using the numpy functions. This is crucial to me since my arrays are quite large.

Please use provided code as explanation what I want to achieve

start_value = 12
start_arr = np.array([-2, -4, -60, -0.5, 2, 2, 1, 70, -2, -5, 2])

out_arr = []
ans = start_value
for i in start_arr:
    if i > 0:
        out_arr.append(i)
        ans = i
    else:
        out_arr.append(ans)

out_arr = np.array(out_arr)
# [12, 12, 12, 12, 2, 2, 1, 70, 70, 70, 2]

I don't know how to tell numpy to use the "previously" assigned value in case i <= 0. Also I can't explain the problem to my browser's search engine in a way it outputs something useful.

can you summarize the logic in a few words? – mozway Apr 04 '23 at 13:38 — mozway, Apr 04 '23 at 13:38
Pandas has `ffill` for this kind of thing. – user2357112 Apr 04 '23 at 13:42 — user2357112, Apr 04 '23 at 13:42

mozway · Answer 1 · 2023-04-04T17:36:10.823

Using pandas:

import pandas as pd

out_arr = (pd.Series(start_arr).where(start_arr>0)
           .ffill().fillna(start_value, downcast='infer').to_numpy()
           )

Output:

array([12, 12, 12, 12,  2,  2,  1, 70, 70, 70,  2])

A numpy approach (inspired from this answer):

a = np.r_[start_value, start_arr]
# array([ 12. ,  -2. ,  -4. , -60. ,  -0.5,   2. ,   2. ,   1. ,  70. ,
#         -2. ,  -5. ,   2. ])

idx = np.maximum.accumulate(np.where(a>0, np.arange(a.shape[0]), 0))
# array([ 0,  0,  0,  0,  0,  5,  6,  7,  8,  8,  8, 11])

out_arr = a[idx][1:]
# array([12., 12., 12., 12.,  2.,  2.,  1., 70., 70., 70.,  2.])

Alain T. · Accepted Answer · 2023-04-04T14:52:28.383

You could build an array of source index for each value in the output array. The source index is the index of the previous positive value. You can compute those using the numpy.maximum.accumulate function after zeroing out indexes of negative positions.

The leading negative items can be processed separately after this.

import numpy as np

start_value = 12
start_arr = np.array([-2, -4, -60, -0.5, 2, 2, 1, 70, -2, -5, 2])

source     = np.arange(start_arr.size)                 # all indices
source[start_arr <= 0] = 0                             # zero out negatives
out_arr    = start_arr[np.maximum.accumulate(source)]  # assign from source
out_arr[out_arr<=0] = start_value                      # leading negatives 


print(out_arr)

# [12. 12. 12. 12.  2.  2.  1. 70. 70. 70.  2.]

Reza Besharat · Answer 3 · 2023-04-04T14:21:07.743

-1

Here is my solution using Numpy only (change 12 with whatever you want):

'''

import numpy as np

array = np.array([-2, -4, -60, -0.5, 2, 2, 1, 70, -2, -5, 2])

replacement = 12

while True:

Ind = np.argmax(array<= 0)

if Ind>0 or replacement == 12:

    if replacement == 12:

        array[Ind]  = replacement

        replacement = None

    else:

        array[Ind]  = array[Ind-1]

else:

    break

'''

edited Apr 04 '23 at 14:21

answered Apr 04 '23 at 13:54

Reza Besharat

3
3

Thank you for your suggestion, but a while loop is no improvement in terms of performance and is still building the array iteratively. – Tarquinius Apr 05 '23 at 07:39
Firstly, you do iterate but with the Numpy function, as you asked. Secondly, you do not build the array again but you only replace the elements which are memory-optimized. – Reza Besharat Apr 06 '23 at 08:26

How to avoid building numpy array iteratively

3 Answers3