1

I have a global dataset of daily values approximately 15000 days x 361 lat x 576 lon. This dataset is binary - there are ones in locations/days that meet a criteria, and 0s where the condition is not met. I want to only keep 1s where they occur 3 or more days in a row. Currently working with the data as a numpy np array but I also work with xarray as well.

My initial idea was a 3 day rolling sum and checking where it was 3, but this only finds the middle days of three+ day periods, not the ends.

Any ideas for an efficient way to accomplish this? ideally without explicitly looping through each item - as that would take a long time. Thanks in advance!

fivebyfive
  • 39
  • 5
  • The question is unclear. In the absolute simplest case where there are only three days of data and this column has `1` for all three days, would that result in `0 0 1`, or `1 1 1`, or something else? – John Gordon Jun 11 '23 at 16:21
  • If the condition is met on all three days, the data is 1 1 1. – fivebyfive Jun 11 '23 at 16:24
  • Does this answer your question? [Find Consecutive Repeats of Specific Length in NumPy](https://stackoverflow.com/questions/59662725/find-consecutive-repeats-of-specific-length-in-numpy) – jared Jun 11 '23 at 16:25
  • So on the first day, even though the condition has not (yet) occurred three days in a row, you still want that to be recorded as a `1`? – John Gordon Jun 11 '23 at 16:28
  • Exactly (which is the trouble). Since it is in a group of 3 or more, I need to retain it. – fivebyfive Jun 11 '23 at 16:30
  • If it helps for clarity, I am identifying individual 'events' that occur over a timespan of 3 or more days. – fivebyfive Jun 11 '23 at 16:34

2 Answers2

2

Approach this by first finding the sets of 3, then shuffling them back two elements with or. Here's the easy to understand version:

import numpy as np
np.random.seed(17)
rands = np.random.randint(2, size=30)
# [1 1 1 0 0 1 0 1 0 1 0 1 0 0 1 1 0 1 1 0 0 0 1 1 0 1 0 1 1 1] 

and_rands = rands[:-2] & rands[1:-1] & rands[2:]
# [1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1]


lhs = np.concatenate((and_rands, np.zeros(2,dtype=and_rands.dtype)))
mid = np.concatenate((np.zeros(1,dtype=and_rands.dtype), and_rands, np.zeros(1, dtype=and_rands.dtype)))
rhs = np.concatenate((np.zeros(2,dtype=and_rands.dtype), and_rands))

result = lhs | mid | rhs
# [1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1]

And here's the same thing, but scaled and a bit more memory efficient:

import numpy as np
np.random.seed(17)
DAYS, DIM2, DIM3 =15000, 361, 576
rands = np.random.randint(2, size=(DAYS, DIM2, DIM3), dtype='i1')
ret = np.zeros((DAYS, DIM2, DIM3), dtype=rands.dtype)
ret[2:, :, :] |= rands[2:, :, :]
ret[2:, :, :] &= rands[1:-1, :, :]
ret[2:, :, :] &= rands[:-2, :, :]
ret[1:-1, :, :] |= ret[2:, :, :]
ret[:-2, :, :] |= ret[1:-1:, :, :]


print(rands[:30, 0, 0])
# [1 1 1 0 0 1 0 1 0 1 0 0 1 0 1 0 1 0 0 1 1 0 0 0 1 0 1 0 0 1 1 1 0 1 1 0 0
 1 1 0 1 1 1 0 1 1 1 0 0 1]
print(ret[:30, 0, 0])
# [1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0
 0 0 0 1 1 1 0 1 1 1 0 0 0]

Carbon
  • 3,828
  • 3
  • 24
  • 51
  • Thanks so much! I'm getting an error with the second script though - 'TypeError: ufunc 'bitwise_or' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe'''. The array is already cast as i1 so I'm a bit confused what the problem is. – fivebyfive Jun 12 '23 at 02:06
  • 1
    Solved issue! Thank you so much – fivebyfive Jun 12 '23 at 02:59
1

A bit more extensible method

from skimage.util import view_as_windows
import numpy as np

def find_blocks(in_arr, window_shape = (3,1,1)):
    dims = len(window_shape)
    windowed_view = view_as_windows(in_arr, window_shape)
    loc = np.logical_and.reduce(windowed_view, tuple(range(-dims, 0)))
    loc_shape = loc.shape
    loc = np.pad(loc, tuple((i-1, 0) for i in window_shape))
    windowed_loc = view_as_windows(loc, loc_shape)
    return np.logical_or.reduce(windowed_loc, tuple(range(dims)))

If you want numpy only, I have a recipe here that replicates view_as_windows (with few added functionalities even, like an axis parameter so you don't need your window_shape to have the same number of dimensions as your in_arr)

def find_blocks_np(in_arr, window = 3, axis = 0):
    windowed_view = window_nd(in_arr, window, axis = axis)
    loc = np.logical_and.reduce(windowed_view, axis + 1)
    loc_shape = loc.shape
    padder = tuple((i-j, 0) for i, j in zip(in_arr.shape, loc.shape))
    loc = np.pad(loc, padder)
    windowed_loc = window_nd(loc, loc_shape)
    return np.logical_or.reduce(windowed_loc, 0)
Daniel F
  • 13,620
  • 2
  • 29
  • 55