1

I have a pandas data frame (90720 rows) consisting of longitude, latitude, and variable columns. The data represent points on a 1.3 km resolution grid but are not in any particular order within the data frame. An example of the dataset looks like:

lon lat var
40.601700 -90.078857 0.006614
40.598942 -90.031372 0.048215
40.592426 -89.920563 0.012860
40.591480 -89.904724 0.006642
40.590546 -89.888916 0.005383
43.642635 -89.904724 0.012860
40.590546 -84.545715 0.012860

I would like to convert these lat/lon/var points into a gridded dataset. Most approaches I have tried (df.pivot) require significant memory due to the size of the data frame. The final gridded data should have a shape of (288,315). Ultimately, I want to plot this data with plt.colormesh() to compare it with other datasets. I appreciate any suggestions!

OCa
  • 298
  • 2
  • 13
VAL
  • 39
  • 3
  • The way to bucket data for this would be to normalize your data to a value between [0,1] and then multiply that by your max grid size for that axis. Since x_norm(x) = (x - x_min) / (x_max - x_min) you'll need to know the max and min values for each column then you should be able to handle each row one at a time by applying a norm function to lat and lon. Keep in mind that The Earth is not a flat plane and simple normalization like this will get weird near the poles. Consider [S2 Buckets](http://s2geometry.io/) – VoNWooDSoN Aug 07 '23 at 21:36
  • 1
    What should the output data look like, e.g. after this pivoting you did? – OCa Aug 07 '23 at 21:56
  • I can plot the data using geopandas. I would like to be able to work with as a gridded dataset, so I can compare it with other datasets (calculate means/std devs/etc) spatially. I am sure there are other work arounds, but this is where my brain can make sense of things I guess. I'd like the data output to be a 2-D array where the arr[x,y] value corresponds to the lat/lon location. The data is on a curvilinear grid, so calculating a lat/lon range and dividing it evenly doesn't quite work and degrades the resolution a bit. Are they any other options? – VAL Aug 09 '23 at 18:58
  • This is kind of what I was thinking, but I know there are likely more efficient approaches. lons = np.array(df['lon'].unique()) lats = np.array(df['lat'].unique()) df_2d = [] for i in lons: for j in lats: for ind in df.index: if (df['lon'][ind] == i) & (df['lat'][ind] == j): df_2d[i,j] = df['var'][ind] – VAL Aug 09 '23 at 19:41

1 Answers1

0

This is strongly inpsired by Resampling irregularly spaced data to a regular grid in Python from 12 years ago. I updated the code to work in current Python, and improved readability. It uses plt.pcolormesh. I suppose this is what you meant when requesting plt.colormesh.

With df the dataframe of your suggested input data:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from scipy.interpolate import griddata

# grid sizing
# number of grid points:
nx, ny = 288, 315
# grid window
xmin, xmax = 40, 45
ymin, ymax = -91, -84

# Generate a regular grid to interpolate the data.
X, Y = np.meshgrid(np.linspace(xmin, xmax, nx), 
                   np.linspace(ymin, ymax, ny))

# Interpolate using "cubic" method
Z = griddata(points = (df.lon, df.lat),
             values = df['var'],
             xi = (X, Y),
             method = 'cubic'))

# Plot the results
plt.figure()
plt.pcolormesh(X, Y, Z)
plt.scatter(x=df.lon, y=df.lat, c=df['var'])
plt.colorbar()
plt.axis([xmin, xmax, ymin, ymax])
plt.xlabel('longitude')
plt.ylabel('latitude')
plt.title('Overlay: Scatter and Grid')
plt.show()

With your initial dataset as circles:

Gridded scatter, cubic interpolation

OCa
  • 298
  • 2
  • 13