1

I have a netcdf file with t_ave at a res. of 0.25x0.25. I want to assign t_ave values to any administrative area using a spatial join, but my areas are very small and at that resolution many of them are not covered by the grid.

I tried to resample the netcdf to a smaller grid (0.01x0.01). It works but it takes too time and the output files are huge...not an effective solution.

It would be better to directly interpolate the original netcdf grid at 0.25 (or after a soft resampling) on centroids of admin. areas but I didn't find any specific example. I am using Python 3.9 and centroids would come from a csv file with standard coordinates (see commented code below).

Any suggestion please?

import pandas as pd
import geopandas as gpd
import xarray as xr
import nctoolkit as nc
#------------ RESAMPLING ------------
data = nc.open_data(mypath + "myfile.nc")
print(data)
data.to_latlon(lon = [6, 19], lat = [36, 48], res = [0.01,0.01])
df=data.to_dataframe()
data.to_nc("myfile_regrid.nc", overwrite=True)
#---------------- SPATIAL JOIN USING RESAMPLED NETCDF -----------------
grid_nc = xr.open_dataset("myfile_regrid.nc")
# converting to geodataframe
xarr = grid_nc
df = xarr.to_dataframe().reset_index()
gdf_ncfile = gpd.GeoDataFrame(df, geometry=gpd.points_from_xy(df.lon, df.lat, crs="EPSG:4326"))
#############################################
##### if I use a list of centroids it would be: #####
#centroids = pd.read_csv("/Users/sandor/Documents/pythonProject/gis/centroids_xy_2015.csv", delimiter=";")
#print(centroids)
#gdf_centroids = gpd.GeoDataFrame(data=centroids, geometry=gpd.points_from_xy(centroids.x, centroids.y, crs="EPSG:4326"))
#############################################
# preparing admin areas to be joined to nc grid
cities = gpd.read_file("admin_areas.geojson")
cities=cities.explode()
# running spatial join
spatial_join = gpd.sjoin(gdf_ncfile, municipalities, op="within")
#-------------- EXPORT JOINED CSV --------------")
spatial_join.to_csv("myfile_regrid.csv")
  • This is a good case for advanced indexing. See this post: https://stackoverflow.com/questions/69330668/efficient-way-to-extract-data-from-netcdf-files/69337183#69337183. Just treat the centroids as points to interpolate to and select the points using data array indexers. – Michael Delgado Oct 11 '22 at 02:02

1 Answers1

2

Use xarray’s advanced indexing, which works for selection with .sel or .isel as well as interpolation with .interp:

points = centroids.to_xarray()
df = grid_nc.interp(
    lat=points.y, lon=points.x
).to_dataframe()

This will result in a pandas DataFrame containing the data of grid_nc but indexed by the index of points. Any additional dimensions, if there are any, such as time, z, etc, will be part of the index as well as a MultiIndex with the combinatorial product of the remaining coordinates.

Michael Delgado
  • 13,789
  • 3
  • 29
  • 54