0

I have many daily NetCDF resulted from a hydrological model and I want to convert them to monthly/yearly level both by summing or averaging them. For this, I use the following code:

import xarray as xr
    
nc_file = r'J:\RESULTS\WB_PRECIPITATION.nc'
ds = xr.open_dataset(nc_file)
monthly_data=ds.resample(time='Y',skipna=True).sum()
output = r'J:\RESULTS\WB_PRECIPITATION_YEARLY.nc'
monthly_data.to_netcdf(output, engine="netcdf4")

The problem is that my original daily file has several zones with nan (_FillValue=-9999) and that when they pass to the new NetCDF they pass to have the value 0. In this case, that is distorting all the calculations.

I already check "skipna" parameter with True and False values and I got the same result.

In pandas, when I have had the same problem I have used the following code, however, I have not been able to adapt it for this situation.

import numpy as np
import pandas as pd 

def very_sum(array_like):
    if any(pd.isnull(array_like)):
        return np.nan
    else:
        return array_like.sum()

df = ... 
df_yearly = df.resample('Y').apply(very_sum)

How can I resample my data without losing the zones with nan. ?

  • I suggest `cdo monsum` command using [CDO tools](https://code.mpimet.mpg.de/projects/cdo). A related example is here, except that you don't need to create the mask: https://stackoverflow.com/questions/61378478/monthly-sum-of-wet-days-from-daily-data-using-climate-data-operators-cdo/61776220#61776220 – Robert Davy May 14 '21 at 05:06

1 Answers1

1

I think you only misplaced the skipna keyword, it belongs in the method rather than in the resample. This is basically a duplicate of: xarray resampling with certain nan treatment

So instead of:

monthly_data=ds.resample(time='Y',skipna=True).sum()

Just do:

monthly_data=ds.resample(time='Y').sum(skipna=False)

As a runnable example:

import numpy as np
import pandas as pd
import xarray as xr

time = pd.date_range("2000-01-01", "2000-12-31")
da = xr.DataArray(data=np.ones(time.size), coords={"time": time}, dims=["time"])
da.data[:45] = np.nan

Default:

da.resample(time="m").sum()

<xarray.DataArray (time: 12)>
array([ 0., 15., 31., 30., 31., 30., 31., 31., 30., 31., 30., 31.])
Coordinates:
  * time     (time) datetime64[ns] 2000-01-31 2000-02-29 ... 2000-12-31

skipna=False:

da.resample(time="m").sum(skipna=False)

<xarray.DataArray (time: 12)>
array([nan, nan, 31., 30., 31., 30., 31., 31., 30., 31., 30., 31.])
Coordinates:
  * time     (time) datetime64[ns] 2000-01-31 2000-02-29 ... 2000-12-31
Huite Bootsma
  • 451
  • 2
  • 6