Python/Numpy - Vectorized implementation of this for loop?

Question

This is a lethargic implementation of a cloud mask based on interpolation across temporal channels of a satellite image. The image array is of shape (n_samples x n_months x width x height x channels). The channels are not just RGB, but also from the invisible spectrum such as SWIR, NIR, etc. One of the channels (or bands, in the satellite image world) is a cloud mask that tells me 0 means "no cloud" and 1024 or 2048 means "cloud" in that pixel.

I'm using this pixel-wise cloud mask channel to change the values on all remaining channels by interpolation between the previous/next month. This implementation is super slow and I'm having a hard time coming up with vectorized implementation.

Is it possible to vectorize this implementation? What is it?
Any suggestion on how to deduce the logic of vectorized implementation of complex array operations? In other words, how do I learn the art of vectorization?

I'm a novice, so please excuse my ignorance.

n_samples = 1055
n_months = 12
width = 40
height = 40
channels = 13 # channel 13 is the cloud mask, based on which the first 12 channel pixels are interpolated)

# This function fills nan values in a list by interpolation
def fill_nan(y):
    nans = np.isnan(y)
    x = lambda z: z.nonzero()[0]
    y[nans]= np.interp(x(nans), x(~nans), y[~nans])
    return y

#for loop to first fill cloudy pixels with nan
for sample in range(1055):
    for temp in range(12):
        for w in range(40):
            for h in range(40):
                if Xtest[sample,temp,w,h,13] > 0:
                    Xtest[sample,temp,w,h,:12] = np.nan

#for loop to fill nan with interpolated values
for sample in range(1055):
    for w in range(40):
        for h in range(40):
            for ch in range(12):
                Xtest[sample,: , w, h, ch] = fill_nan(Xtest[sample,: , w, h, ch])

I am not sure what you are trying to do in the second loop and the function, it throws error for a random numpy array. You probably need to also share the a sample array that you want solved (smaller one, not the one with the full size). — Ananda, Feb 02 '21 at 04:06
Actually the `fill_nan` function doesn't make sense IMO. `np.interp()` requires arrays of same size to be passed in the first and second argument. This is not going to the case with `x(nans)` and `x(~nans)` a lot of time. — Ananda, Feb 02 '21 at 04:28
Ananda, I'm trying to fill nan's with values interpolated between the same pixel in subsequent and previous temporal value. For example, nan is the value for a particular pixel in channel 5 for April. I want to interpolate the from the same pixel in the same channel 5 from the months of March and May to be fille din for April. For a simplified version of this question, see this post https://stackoverflow.com/questions/6518811/interpolate-nan-values-in-a-numpy-array. — Harish Karthik Lakshmanan, Feb 02 '21 at 21:58
What I meant is that I don't think your implementation is right. `np.interp(x(nans), x(~nans), y[~nans])` is going to throw an error most of the time. The only time it will work is if `y` has the same number of nans and not-nans (because that function needs same length of args in the first and second arg) — Ananda, Feb 03 '21 at 03:33

score 2 · Answer 1 · answered Feb 02 '21 at 03:53

For the first loop,

import numpy as np

Xtest = np.random.rand(10, 3, 2, 4, 14)
Xtest_v = Xtest.copy()

for sample in range(10):
    for temp in range(3):
        for w in range(2):
            for h in range(4):
                if Xtest[sample,temp,w,h,13] > 0:
                    Xtest[sample,temp,w,h,:12] = np.nan

Xtest_v[..., :12][Xtest_v[..., 13]>0] = np.nan

print(np.nansum(Xtest))
print(np.nansum(Xtest_v))

You can verify that both the arrays are the same by printing out the sum ignoring nans.

Thank you so much. It works like a charm. – Harish Karthik Lakshmanan Feb 02 '21 at 21:50 — Harish Karthik Lakshmanan, Feb 02 '21 at 21:50

Python/Numpy - Vectorized implementation of this for loop?

1 Answers1