Another solution for the problem of retrieving a multitemporal groupby function over a netcdf file using xarray library is to use the xarray-DataArray method called "resample" coupled with the "groupby" method. This approach is also available for xarray-DataSet objects.
Through this approach, one can retrieve values like monthly-hourly mean, or other kinds of temporal aggregation (i.e.: annual monthly mean, bi-annual three-monthly sum, etc.).
The example below uses the standard xarray tutorial dataset of daily air temperature (Tair). Notice that I had to convert the time dimension of the tutorial data into a pandas datetime object. If this conversion were not applied, the resampling function would fail, and an error message would appear (see below):
Error message:
"TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'Index'"
Despite that timeindex problem (which could be another Issue for discussion in StackOverFlow), the code below presents two possible solutions for the multitemporal grouping problem in xarray objects. The first uses the xarray.core.groupby.DataArrayGroupBy class, while the second only uses the groupby method from the normal xarray-dataArray and xarray-DataSet classes.
Sincerely yours,
Philipe Riskalla Leal
Code snippet:
ds = xr.tutorial.open_dataset('rasm').load()
def parse_datetime(time):
return pd.to_datetime([str(x) for x in time])
ds.coords['time'] = parse_datetime(ds.coords['time'].values)
# 1° Option for multitemporal aggregation:
time_grouper = pd.Grouper(freq='Y')
grouped = xr.core.groupby.DataArrayGroupBy(ds, 'time', grouper=time_grouper)
for idx, sub_da in grouped:
print(sub_da.resample({'time':'3M'}).mean().coords)
# 2° Option for multitemporal aggregation:
grouped = ds.groupby('time.year')
for idx, sub_da in grouped:
print(sub_da.resample({'time':'3M'}).mean().coords)