-1

I have a table I create from a larger dataset which gives me the y, x, and z values that I want to plot. I create a grid and then interpolate the values, but I don't think the function interpolation along the rows correctly since I can see data as break-up points around Latitude 78-80 in the plot (see image), this makes me think that the interpolation is not been done correctly along the rows. Does anyone have a tip on how to smooth this data?

aou_df = df_1994.pivot_table(index='CTDPRS', columns = 'LATITUDE', values='AOU')
    aou_df = aou_df.interpolate(method='linear', limit_area='inside', axis =0 )
##Plotting AOU 1994
y =  ([   4.2,    4.7,    4.8,    4.9,    5.4,    9.1,    9.6,    9.7,
                    10.0,   10.1,
                  ...
                  3568.2, 3608.6, 3818.6, 3824.9, 3866.7, 3979.1, 4013.4, 4133.1,
                  4159.3, 4287.3],

x= ([72.13,  73.0, 73.49, 73.98,  74.5,  75.0, 75.45, 75.75, 75.94,
              76.62, 77.33, 77.78, 78.14, 78.15, 78.98, 79.98, 80.15, 80.16,
              80.33, 80.71, 81.24, 81.58, 82.47, 83.17, 84.06, 84.85, 85.89,
              87.16, 88.06, 88.79, 88.86, 88.95, 89.02,  90.0],

z = [[-12.29372749,          nan,          nan, ...,          nan,
             nan,          nan],
   [         nan,          nan, -43.41465869, ...,          nan,
             nan,          nan],
   [         nan, -54.49999783,          nan, ...,          nan,
             nan,          nan],
   ...,
   [         nan,          nan,          nan, ...,          nan,
             nan,          nan],
   [         nan,          nan,          nan, ...,          nan,
     55.87256821,          nan],
   [         nan,          nan,          nan, ...,  55.39665852,
             nan,  55.05005376]])


xi, yi = np.meshgrid(x,y,indexing='ij')

#from matplotlib.colors import LogNorm
plt.figure(figsize=(25,10))
levels = np.linspace(-135,135)
#cbar = plt.colorbar(ticks=(-85,-65,-45,-25,-5,15,35,55,75,95, 115,135))

plt.contourf(xi,yi,z, cmap = 'jet', levels=levels,vmin=-135, vmax=135)
plt.gca().invert_yaxis()
plt.gca().invert_xaxis()
cbar = plt.colorbar(ticks=(-135,-110,-85,-60,-35,0,35,60,85,110,135), extend= 'both')
cbar.set_label('AOU', fontsize=18)
cbar.ax.tick_params(labelsize=18)
plt.xlabel('LAT',fontsize=18)
plt.ylabel('Pressure (dbar)' ,fontsize=18)
plt.ylim(bottom = 1000)
plt.xticks(fontsize=18)
plt.yticks(fontsize=18)
plt.plot(x,range(len(x)),'gD', clip_on=False, markersize=10)
#plt.xlim(left = 80)

enter image description here

  • This question is not reproducible without **data**. This question needs a complete [SSCCE](http://sscce.org/). Please see [How to provide a reproducible dataframe](https://stackoverflow.com/q/52413246/7758804), then **[edit] your question**, and paste the clipboard into a code block. Always provide a [mre] **with code, data, errors, current output, and expected output, as [formatted text](https://stackoverflow.com/help/formatting)**. If relevant, plot images are okay. If you don't include a mre, it is likely the question will be downvoted, closed, and deleted. – Trenton McKinney May 24 '23 at 20:49

1 Answers1

1

Ok, the values are in a table... I'm sorry... I updated the code... Try to run...

Try this... You can to pass method='Linear' and method='Cubic' to griddata

from scipy.interpolate import griddata

### getting the valid data
xdata, ydata = np.meshgrid(x, y)
not_nan = ~np.isnan(xdata) & ~np.isnan(ydata) & ~np.isnan(z)
i, j = np.argwhere(not_nan).T
xval = xdata[i, j]
yval = ydata[i, j]
zval = z[i, j]
xy = np.column_stack((xval, yval))

### mesh
nx_mesh = 100
ny_mesh = 100
xi, yi = np.meshgrid(np.linspace(xval.min(), xval.max(), nx_mesh), 
                     np.linspace(yval.min(), yval.max(), ny_mesh))

### interpolation
zi = griddata(xy, zval, (xi, yi), method='nearest')

### Pass xi, yi and zi to contour....

EDIT: This is the result from Joao's code using method='nearest' enter image description here

This is the 'linear' result: enter image description here

Joao_PS
  • 633
  • 1
  • 9
  • Hi Joao! Thanks for your input!!, I tried the code and it gives back the error ValueError: operands could not be broadcast together with shapes (34,) (838,) when trying to pas the i, j = np.argwhere(not.nan).T line. My arrays shapes are x (34,) , y (838,) because I am pulling out the values from a pivot table. – Maria Cristina Alvarez May 25 '23 at 16:57
  • I now realize that `x` and `y` are 1D but `z` is 2D... Does vector `z` have 34 x 838 = 28492 elements? I updated the code... – Joao_PS May 25 '23 at 18:17
  • Yes, z has shape (838, 34), with 2842 elements. This code works great! The only thing is that the nearest method will interpolate all the way down to my 4000 value on the y-axis and some values on the x-axis do not go all the way down. Do you know if there is a way to limit the area where is interpolating the same way we can use pd.Dataframe.interpolate (limit_area = 'inside') ? Or limiting the size of the NaN it will fill? I will prefer to interpolate using the 'linear' method, but it looks kinda funky (see the images attached to your answer) – Maria Cristina Alvarez May 25 '23 at 18:54
  • I have added the result from your code :) – Maria Cristina Alvarez May 25 '23 at 18:55
  • I think `DataFrame.interpolate` is applicable for 1D only. You could try `scipy.interpolate.SmoothBivariateSpline`... Anyway, if there is too little data for x < 76 then the interpolation is not good. See the data distribution in this example: https://docs.scipy.org/doc/scipy/reference/generated/scipy.interpolate.griddata.html#scipy.interpolate.griddata Note that the data (black dots) are well distributed... – Joao_PS May 26 '23 at 17:09
  • You could also try interpolating only a part of the data... maybe x < 80... and using `plt.contourf` twice. For that, it is necessary to pass the `extent` option inside `contourf`. To plot data x >= 80, use `extent=(80, x.max(), y.min(), y.max())`. To plot the interpolated points x< 80, use `extent=(x.min(), 80, y.min(), y.max())`. – Joao_PS May 26 '23 at 17:26
  • Tip: check the source of your data or the sheet where it's stored... Look for inconsistent data or values like text characters, white or blank space, semicolons, tabs, quotes and so on... – Joao_PS May 26 '23 at 17:30