How to replace outliers of 2d array with estimated background values?

Question

I have one 2D array called no2 which is related to the other two 2d arrays sza and vza.

Test data (test.npz, 450 KB) can be downloaded from Google Drive.

Here's the overview:

import numpy as np
import matplotlib.pyplot as plt

data = np.load('test.npz')
sza = data['sza']
vza = data['vza']
no2 = data['no2']

fig, axs = plt.subplots(2, 2, figsize=(8, 6))

ax1, ax2, ax3, ax4 = axs.flat

m = ax1.pcolormesh(no2)
plt.colorbar(m, ax=ax1)
ax1.set_title('no2')

m = ax2.pcolormesh(sza)
plt.colorbar(m, ax=ax2)
ax2.set_title('sza')

m = ax3.pcolormesh(vza)
plt.colorbar(m, ax=ax3)
ax3.set_title('vza')

s = ax4.scatter(sza, no2, c=vza, s=1)
plt.colorbar(s, ax=ax4, label='vza')
ax4.set_xlabel('sza')
ax4.set_ylabel('no2')

plt.tight_layout()

I wanna replace the two high no2 regions based on the surrounding background or low no2 values to get something like this:

Because it seems the no2 relies on the sza linearly as shown in the last subplot, I come up with three ideas:

Curve fit

Using the fitting between no2 and sza with several vza bins to calculate the background no2 for replacing the high no2 values:

fig, axs = plt.subplots(3, 4, figsize=(12, 6))
ax = axs.flat

for index,bin in enumerate(range(5, 65, 5)):
    mask = (vza>bin)&(vza<bin+5)
    # print(index)
    s = ax[index].scatter(sza[mask], no2[mask], c=vza[mask], s=1)
    plt.colorbar(s, ax=ax[index], label='vza')
    ax[index].set_title(str(bin)+'<vza<'+str(bin+5))

for ax in axs.flat:
    ax.set_xlabel('sza')
    ax.set_ylabel('no2')

plt.tight_layout()

I tried to fit the curve for one bin (45<sza<50):

from scipy.optimize import curve_fit

def func(x, a, b, c):
    return a * np.exp(-b * x) + c

xdata = sza[(vza>45)&(vza<50)]
ydata = no2[(vza>45)&(vza<50)]
popt, pcov = curve_fit(func, xdata, ydata, p0=(1, 1e-5, 1))

plt.plot(xdata, ydata, 'b-', label='data')

plt.plot(xdata, func(xdata, *popt), 'r-',
         label='fit: a=%5.3f, b=%5.3f, c=%5.3f' % tuple(popt))
plt.legend()

However, it failed to get what I want:

Is it possible to meet both conditions below?

- Fit curve and get the background values for high values

- Add random noise to the fitted background values (this can run several times to get more real values like the surrounding background values)

Or any other better methods?

Gradient

I checked the gradient and hoped it can make the high values more significant:

# https://stackoverflow.com/questions/34003993/generating-gradient-map-of-2d-array
grad = np.gradient(no2)
fulgrad = np.sqrt(grad[0]**2 + grad[1]**2)

fig, axs = plt.subplots(1, 2, figsize=(6, 3))

ax1, ax2  = axs.flat

m = ax1.pcolormesh(no2)
plt.colorbar(m, ax=ax1)
ax1.set_title('no2')

m = ax2.pcolormesh(fulgrad)
plt.colorbar(m, ax=ax2)
ax2.set_title('no2 gradient')

plt.tight_layout()

However, it can only show some outlines:

Image processing

I can't figure out how to replace the high values only and keep the background unchanged using the scikit-learn.

score 0 · Accepted Answer · answered Feb 13 '22 at 21:36

Finally, I figure out how to replace the high values with estimated background values.

Just use dual-tree complex wavelet transform from scikit-ued.

import numpy as np
import matplotlib.pyplot as plt
from skued import baseline_dt

data = np.load('../data/test.npz')

baseline = baseline_dt(data['no2'], wavelet = 'qshift3', level = 6, max_iter = 150)

fig, axs = plt.subplots(1, 3, figsize=(12, 4))

ax1, ax2, ax3 = axs.flat

m = ax1.imshow(data['no2'], vmin=0, vmax=7e-4)
plt.colorbar(m, ax=ax1)
ax1.set_title('no2')

m = ax2.imshow(baseline, vmin=0, vmax=7e-4)
plt.colorbar(m, ax=ax2)
ax2.set_title('baseline')

m = ax3.imshow(data['no2']-baseline, vmin=0, vmax=7e-4)
plt.colorbar(m, ax=ax3)
ax3.set_title('no2 - baseline')

How to replace outliers of 2d array with estimated background values?

Curve fit

Gradient

Image processing

1 Answers1