2

I have a (61,77,365) numpy array full of boolean values.

Taking a random slice across axis 2 (len=365) for illustrative purposes:

data = [False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False True False False False False True False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False True True False False False True True True True True True True True False False False False False False False False False False False False False False False False False False False True True False False False False False False True True True True True True False False False False False False False False False False False False True True False False False True False False False True True True False False True True False True True False False True False True True True True True True False False False False True True True]

I want to replace the True values with the length of their associated group of consecutive Trues, i.e.:

[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 2 0 0 0 8 8 8 8 8 8 8 8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 2 0 0 0 0 0 0 6 6 6 6 6 6 0 0 0 0 0 0 0 0 0 0 0 0 2 2 0 0 0 1 0 0 0 3 3 3 0 0 2 2 0 2 2 0 0 1 0 6 6 6 6 6 6 0 0 0 0 3 3 3]

How can I do this efficiently for the 3D array? I want to avoid looping as it would get very computationally expensive.

So far, I have used cumulative summing (which resets when it reaches False), and then done the same for the data reversed. Adding these together and subtracting 1 (if data=True) gives the required answer, but it's so convoluted and inefficient:

no_reset = np.cumsum(data,axis=axis)
reset = (data == 0)
excess = np.maximum.accumulate(no_reset*reset,axis=axis)
result = no_reset - excess
print(result)

result = [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 2 0 0 0 1 2 3 4 5 6 7 8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 2 0 0 0 0 0 0 1 2 3 4 5 6 0 0 0 0 0 0 0 0 0 0 0 0 1 2 0 0 0 1 0 0 0 1 2 3 0 0 1 2 0 1 2 0 0 1 0 1 2 3 4 5 6 0 0 0 0 1 2 3]

no_reset_rev = np.cumsum(data_hits[..., ::-1],axis=axis)
reset_rev = (data_hits[..., ::-1] == 0)
excess_rev = np.maximum.accumulate(no_reset_rev*reset_rev,axis=axis)
result_rev = no_reset_rev - excess_rev
print(result_rev)

result_rev = [1 2 3 0 0 0 0 1 2 3 4 5 6 0 1 0 0 1 2 0 1 2 0 0 1 2 3 0 0 0 1 0 0 0 1 2 0 0 0 0 0 0 0 0 0 0 0 0 1 2 3 4 5 6 0 0 0 0 0 0 1 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 2 3 4 5 6 7 8 0 0 0 1 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]

final_res = result + result_rev[..., ::-1] - (1*data_hits)
print(final_res)

final_res = [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 2 0 0 0 8 8 8 8 8 8 8 8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 2 0 0 0 0 0 0 6 6 6 6 6 6 0 0 0 0 0 0 0 0 0 0 0 0 2 2 0 0 0 1 0 0 0 3 3 3 0 0 2 2 0 2 2 0 0 1 0 6 6 6 6 6 6 0 0 0 0 3 3 3]

2 Answers2

0

I'm not sure if this is faster but you can try this way. First, since your array is already in numpy, you can use np.where to change the boolean to 0 and 1.

myarray = np.where(data == True,1,0)

Next, you need to get the index of the 1's using np.where or np.nonzero (If you try it in your full array, np.nonzero will be faster than np.where.)

indexes = np.nonzero(myarray == 1)

With this, we will use a function posted by Unutbu to split the indexes based on the consecutive values. The link is here https://stackoverflow.com/a/7353335/16836078

def consecutive(data, stepsize=1):
    return np.split(data, np.where(np.diff(data) != stepsize)[0]+1)

split_index = consecutive(indexes[0])

For the last part, I apologize if this is not your requirement, we will use a for loop to assign the accumulated number of the consecutive values to the original array.

for i in split_index:
    number = len(i)
    myarray[i] = number

myarray

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 2, 2, 0, 0, 0, 8, 8, 8, 8, 8, 8, 8, 8, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 2, 0, 0, 0,
       0, 0, 0, 6, 6, 6, 6, 6, 6, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2,
       2, 0, 0, 0, 1, 0, 0, 0, 3, 3, 3, 0, 0, 2, 2, 0, 2, 2, 0, 0, 1, 0,
       6, 6, 6, 6, 6, 6, 0, 0, 0, 0, 3, 3, 3])

Extra

I have tried to use your full array to do the operations. Looks like there're a few things to be changed.

While finding the indexes of the 1's and using the consecutive function, you can use list comprehension.

indexes = [np.nonzero(j == 1)[0] for i in abc for j in i] 
split_index = [consecutive(i) for i in indexes]

Finally, there's a nested for loop to assign the values to the original array.

split_index_arrary = np.reshape(61,77,14)

for dimension1, i in enumerate(split_index_array):
    for dimension2, j in enumerate(i):
        for k in j:
            numbers = len(k)
            myarray[dimension1][dimension2][k] = numbers
0

The key is to get the indices at which the Trues are present along with the length of the Trues.

First create a dataframe from the data like:

df = pd.DataFrame({'data':data})

You can first get cumulative sum of the Trues, False by using this trick:

tf = (df['data'] != df['data'].shift()).cumsum()

Then create another dataframe for only True values like:

df2 = pd.DataFrame(tf[df['data']])

Then reset index so you get the indices at which the values are True (this is important) and group and agg the list of indices:

df2 = df2.reset_index().rename(columns={'index':'trues'}).groupby('data',as_index=False).agg(list)

Then get the length of each list

df2['data'] = df2['trues'].apply(lambda x:len(x))

Then explode by the list

df2 = df2.explode('trues')

Then set index and remove the name so you can locate it in your original df and assign properly

df2 = df2.set_index('trues', drop=True)
df2.index.name = ''

Now assign the values in original df

df.iloc[df2.index] = df2['data']

All code is:

df = pd.DataFrame({'data':data})
tf = (df['data'] != df['data'].shift()).cumsum()
df2 = pd.DataFrame(tf[df['data']])
df2 = df2.reset_index().rename(columns={'index':'trues'}).groupby('data',as_index=False).agg(list)
df2['data'] = df2['trues'].apply(lambda x:len(x))
df2 = df2.explode('trues')
df2 = df2.set_index('trues', drop=True)
df2.index.name = ''
df.iloc[df2.index] = df2['data']

Your final answer is in df['data']

SomeDude
  • 13,876
  • 5
  • 21
  • 44