I have a satellite image which is quite noisy. I wanted to try a Water Anomaly Filter proposed here. It is straight-foreword and appears to be suited for my use case, as it expects that the "real" structures on the ground are much larger than the noise, which appears only in isolated pixels. The proposed algorithm works as follow:
- let a 5x5 pixel kernel run over the input image
- calculate the
mean
and standard deviation (SD
) of each kernel subset, but exclude the center pixel - remove outliers (i.e. outside
mean +/- 1 SD
) and do step 3 again to calculate mean and SD without outliers (mean_wo
andSD_wo
) - if the value of the pixel in the center of the kernel is outside
mean_wo +/- 1 SD_wo
, replace it bymean_wo
. Else keep the original value.
This may sound complicated, but it keeps as many original data values as possible, which is crucial for me.
The original algorithm is implemented in IDL, but I need it in Python. My data are in netCDF-files and I usually work with xarray
. I already looked into xarray.DataArray.rolling
and xarray.apply_ufunc
, but I didn't found a way to implement this algorithm with its kernel and if-statement. I'm also open to solutions with other packges, but I also had no success to implement an if-condition into a kernel with numpy
and scipy
so far.
Of course, I could implement the algorithm with brute force and let the kernel run over the image with two for loops and so on, but since I have a few satellite images, each with thousands x thousands pixels and with several bands, I would like to find a faster solution.
import xarray as xr
import numpy as np
np.random.seed(42)
# define sample data
data = np.random.rand(25).reshape(5, 5)
data[1,1] = data[3,3] = 5
xds = xr.DataArray(data, dims=['y', 'x'], name='raster')
print(xds)
#<xarray.DataArray 'raster' (y: 5, x: 5)>
#array([[0.37454012, 0.95071431, 0.73199394, 0.59865848, 0.15601864],
# [0.15599452, 5. , 0.86617615, 0.60111501, 0.70807258],
# [0.02058449, 0.96990985, 0.83244264, 0.21233911, 0.18182497],
# [0.18340451, 0.30424224, 0.52475643, 5. , 0.29122914],
# [0.61185289, 0.13949386, 0.29214465, 0.36636184, 0.45606998]])
#Dimensions without coordinates: y, x
# apply rolling to apply a mean filter. For sake of simplicity only with a 3x3 kernel size
print(xds.rolling({"x": 3, "y": 3}, center = True).mean())
#<xarray.DataArray 'raster' (y: 5, x: 5)>
#array([[ nan, nan, nan, nan, nan],
# [ nan, 1.10026178, 1.19592772, 0.54318239, nan],
# [ nan, 0.98416787, 1.59010905, 1.02421734, nan],
# [ nan, 0.43098129, 0.96018785, 0.90635209, nan],
# [ nan, nan, nan, nan, nan]])
#Dimensions without coordinates: y, x
# However, I which to implement the filter described above, which should yield (again with a kernel size of 3x3):
#<xarray.DataArray 'raster' (y: 5, x: 5)>
#array([[ nan, nan, nan, nan, nan],
# [ nan, 0.61279450, 0.86617615, 0.60111501, nan],
# [ nan, 0.96990985, 0.83244264, 0.21233911, nan],
# [ nan, 0.30424224, 0.52475643, 0.39464610, nan],
# [ nan, nan, nan, nan, nan]])
#Dimensions without coordinates: y, x