3

Problem:
I'd like to resample a xarray dataset e.g. the sum or mean with each resulting value being nan when at least one of the input values was nan. With pandas I can easily apply an own mean,sum etc. function giving me my preferred nan treatment. xarray also allows resample.apply(own_func) but I have problems defining the own func.

Example (from xarray's documentation):

dat=np.linspace(0, 11, 12)
dat[2]=np.nan
da = xr.DataArray(dat,
                  coords=[pd.date_range('15/12/1999',
                                        periods=12, 

freq=pd.DateOffset(months=1))],
                      dims='time')

da.resample(time="QS-DEC").sum()

What I get:

<xarray.DataArray (time: 4)>
array([ 1., 12., 21., 30.])
Coordinates:
  * time     (time) datetime64[ns] 1999-12-01 2000-03-01 2000-06-01 2000-09-01

@JulianGiles answer:

da.resample(time="QS-DEC",skipna=False).mean()
<xarray.DataArray (time: 4)>
array([ 0.5,  4. ,  7. , 10. ])
Coordinates:
  * time     (time) datetime64[ns] 1999-12-01 2000-03-01 2000-06-01 2000-09-01

What I want:

<xarray.DataArray (time: 4)>
array([ 1., NAN, 21., 30.])
Coordinates:
  * time     (time) datetime64[ns] 1999-12-01 2000-03-01 2000-06-01 2000-09-01
mgraf
  • 123
  • 1
  • 8

2 Answers2

6

As it says in the documentation (http://xarray.pydata.org/en/stable/generated/xarray.Dataset.resample.html) you can specify skipna depending on how do you want nans to be handled.

In your case, specifying skipna = False will do it. Since resample has been recently modified to defer calculations, you can do it in two ways:

da.resample(time="QS-DEC").sum(skipna=False)

or the old way (where you put everything inside the .resample()):

da.resample("QS-DEC", 'time', how='sum', skipna=False)
JulianGiles
  • 328
  • 2
  • 7
  • No, `skipna`says: _Whether to skip missing values when aggregating in downsampling._ My example is about upsampling from monthly to seasonal data. – mgraf Feb 01 '19 at 07:49
  • No, you are misunderstanding the word 'downsampling'. If you go from monthly to seasonal data you are going from a higher resolution to a lower resolution, that is, you are downsampling. Upsampling would be to increase resolution, for example, going from seasonal to monthly resolution. – JulianGiles Feb 02 '19 at 17:16
  • Thanks, I mixed up those two! Still the `skipna` argument does not solve my problem - I extended the questions to show this. – mgraf Feb 04 '19 at 06:48
  • Apologies for the late response. You have to put the `skipna` argument inside the .sum() operation for it to work. I'll modify my answer to be more clear. – JulianGiles Feb 08 '19 at 14:29
1

You can use the combination of xarray resample and reduce:

#Dummy function to see the array grouping
def func(x, axis): #reduce expect a function with axis argument
    print(x)  #To see the array grouping
    return x #Not relevant

da.resample(time="QS-DEC").reduce(func)

Nan is in the first quarter (not in the second as you expect)

[ 0.  1. nan]
[3. 4. 5.]
[6. 7. 8.]
[ 9. 10. 11.]

So, using np.sum() the output with the nan is in the first quarter:

import numpy as np
da.resample(time="QS-DEC").reduce(np.sum)
<xarray.DataArray (time: 4)>
array([nan, 12., 21., 30.])
Coordinates:
  * time     (time) datetime64[ns] 1999-12-01 2000-03-01 2000-06-01 2000-09-01

If you want to avoid nan, simply use np.nansum():

da.resample(time="QS-DEC").reduce(np.nansum)
<xarray.DataArray (time: 4)>
array([ 1., 12., 21., 30.])
Coordinates:
  * time     (time) datetime64[ns] 1999-12-01 2000-03-01 2000-06-01 2000-09-01

The same applies to np.mean(), np.nanmean(), np.std(), np,nanstd(), etc.

For more complex functions that are used with reduce, you can see this answer: https://stackoverflow.com/a/60627663/6841963

CamiloEr
  • 1,112
  • 1
  • 9
  • 12