I have one 2D array called no2
which is related to the other two 2d arrays sza
and vza
.
Test data (test.npz, 450 KB) can be downloaded from Google Drive.
Here's the overview:
import numpy as np
import matplotlib.pyplot as plt
data = np.load('test.npz')
sza = data['sza']
vza = data['vza']
no2 = data['no2']
fig, axs = plt.subplots(2, 2, figsize=(8, 6))
ax1, ax2, ax3, ax4 = axs.flat
m = ax1.pcolormesh(no2)
plt.colorbar(m, ax=ax1)
ax1.set_title('no2')
m = ax2.pcolormesh(sza)
plt.colorbar(m, ax=ax2)
ax2.set_title('sza')
m = ax3.pcolormesh(vza)
plt.colorbar(m, ax=ax3)
ax3.set_title('vza')
s = ax4.scatter(sza, no2, c=vza, s=1)
plt.colorbar(s, ax=ax4, label='vza')
ax4.set_xlabel('sza')
ax4.set_ylabel('no2')
plt.tight_layout()
I wanna replace the two high no2
regions based on the surrounding background or low no2
values to get something like this:
Because it seems the no2
relies on the sza
linearly as shown in the last subplot, I come up with three ideas:
Curve fit
Using the fitting between no2
and sza
with several vza
bins to calculate the background no2
for replacing the high no2
values:
fig, axs = plt.subplots(3, 4, figsize=(12, 6))
ax = axs.flat
for index,bin in enumerate(range(5, 65, 5)):
mask = (vza>bin)&(vza<bin+5)
# print(index)
s = ax[index].scatter(sza[mask], no2[mask], c=vza[mask], s=1)
plt.colorbar(s, ax=ax[index], label='vza')
ax[index].set_title(str(bin)+'<vza<'+str(bin+5))
for ax in axs.flat:
ax.set_xlabel('sza')
ax.set_ylabel('no2')
plt.tight_layout()
I tried to fit the curve for one bin (45<sza<50):
from scipy.optimize import curve_fit
def func(x, a, b, c):
return a * np.exp(-b * x) + c
xdata = sza[(vza>45)&(vza<50)]
ydata = no2[(vza>45)&(vza<50)]
popt, pcov = curve_fit(func, xdata, ydata, p0=(1, 1e-5, 1))
plt.plot(xdata, ydata, 'b-', label='data')
plt.plot(xdata, func(xdata, *popt), 'r-',
label='fit: a=%5.3f, b=%5.3f, c=%5.3f' % tuple(popt))
plt.legend()
However, it failed to get what I want:
Is it possible to meet both conditions below?
- Fit curve and get the background values for high values
- Add random noise to the fitted background values (this can run several times to get more real values like the surrounding background values)
Or any other better methods?
Gradient
I checked the gradient and hoped it can make the high values more significant:
# https://stackoverflow.com/questions/34003993/generating-gradient-map-of-2d-array
grad = np.gradient(no2)
fulgrad = np.sqrt(grad[0]**2 + grad[1]**2)
fig, axs = plt.subplots(1, 2, figsize=(6, 3))
ax1, ax2 = axs.flat
m = ax1.pcolormesh(no2)
plt.colorbar(m, ax=ax1)
ax1.set_title('no2')
m = ax2.pcolormesh(fulgrad)
plt.colorbar(m, ax=ax2)
ax2.set_title('no2 gradient')
plt.tight_layout()
However, it can only show some outlines:
Image processing
I can't figure out how to replace the high values only and keep the background unchanged using the scikit-learn.