To speed up calculation of xarray packages, I tried to add numba guvectorize to functions, but there are several problems:
- If I write two functions:
read_pr
andday_clim
, input ofday_clim
is no longer xarray since guvectorize is set tofloat64[:], float64[:]
. Thus, groupby function does not work. I tried alsoxr.core.dataarray.DataArray[:], xr.core.dataarray.DataArray[:]
, but error popsNameError: name 'xr' is not defined
. - I would like to apply @guvectorize to
read_pr
, too. However, guvectorize needs type and shape declared at first, and the shape along each dimension should remain the same. For example,
(m),(n),(n) -> (m,n) # ok
(n),() -> (m,n) # error
Input in read_pr
are string and float ( shape: () ), while the output is xarray ( type: <class 'xarray.core.dataarray.DataArray'>, shape: (l,m,n) )
Code:
from numba import float64, guvectorize
import numba
import numpy as np
import xarray as xr
path = '/data3/USERS/waynetsai/pyaos_wks_samples/data/'
fname = 'cmorph_sample.nc'
lats = -20
latn = 30
lon1 = 89
lon2 = 171
time1 = '2000-01-01'
time2 = '2020-12-31'
def read_pr(path, fname, time1, time2, lats, latn, lon1, lon2):
with xr.open_dataset(path + fname) as pr_ds:
pr = (pr_ds.sel(time=slice(time1,time2),
lat=slice(lats,latn),
lon=slice(lon1,lon2)).cmorph)
return pr
pr = xr.apply_ufunc(read_pr, path, fname, time1, time2, lats, latn, lon1, lon2)
@guvectorize(
"(float64[:], float64[:])",
"(l,m,n) -> (l,m,n)"
)
def day_clim(pr):
prGB = pr.groupby("time.day")
prDayClim = prGB.mean("time")
return prDayClim
prDayClim = xr.apply_ufunc(day_clim, pr)
All suggestions are welcome!