1

I have Netcdf file loaded in an xarray dataset and I want to make daily climatologies without the leap day that is, without 29th Feb included in it. I'm trying the Dataset.drop method by the syntax is not so intuitive for me. Here is the Dataset

print(ds)
>><xarray.Dataset>
Dimensions:        (lat: 1, lev: 1, lon: 720, time: 27133)
Coordinates:
* lon            (lon) float32 -180.0 -179.5 -179.0 ... 178.5 179.0 179.5
* lev            (lev) float32 1.0
* time           (time) datetime64[ns] 2000-01-02T18:00:00 ... 2018-07-30
Dimensions without coordinates: lat
Data variables:
Var1              (time, lev, lon) float32 ...
Var2              (time, lat, lon) float64 ...
Var3              (time, lat, lon) float64 ...

I tried

ds_N_R.drop(['Var1', 'Var2', 'Var3'], time='2000-02-29')
>>TypeError: drop() got an unexpected keyword argument 'time'
##another approach
ds_N_R.sel(time='2000-02-29').drop(['Var1', 'Var2', 'Var3'])
## gives not the result I intended
<xarray.Dataset>
Dimensions:  (lev: 1, lon: 720, time: 4)
Coordinates:
* lon      (lon) float32 -180.0 -179.5 -179.0 -178.5 ... 178.5 179.0 179.5
* lev      (lev) float32 1.0
* time     (time) datetime64[ns] 2000-02-29 ... 2000-02-29T18:00:00
Data variables:
*empty*

How do I proceed here? It would be great to know if there is a direct method through which I can calculate daily climatologies considering only 365 days of a year but I would also like to know how to remove data from a particular time step when required.

Light_B
  • 1,660
  • 1
  • 14
  • 28

2 Answers2

7

The right way to use drop() here would be: ds_N_R.drop([np.datetime64('2000-02-29')], dim='time')

But I think this could actually be more cleanly done with an indexing operation, e.g., ds_N_R.sel(time=~((ds_N_R.time.dt.month == 2) & (ds_N_R.time.dt.day == 29)))

shoyer
  • 9,165
  • 1
  • 37
  • 55
  • 1
    The drop method only removes the first time step from '2002-02-29' and leaves the other 3 time-steps for that day. But, the 'sel' method you suggested is brilliant. I couldn't have figured it out myself to use 'time.dt.month' instead of 'time.month' as 'time' is a dataarray. What I find a bit of frustrating is that it takes me many tries to get the correct syntax for the new functions. I tried reading the source code of the function but it seems that it would take more time and effort from my side to get a good grasp of the source code of the functions. – Light_B Nov 20 '18 at 10:12
  • I can give an example of what I referred above as the syntax not coming intuitively to me. When I'm using 'group by' to calculate climatology, for example, it works without using 'time.dt'. ds.groupby('time.day').mean(dim='time') and, in fact 'time.dt.day' gives an error but, when using the 'sel' method, 'time.month' gives an error. – Light_B Nov 20 '18 at 10:57
  • The 'sel' method you suggested above removes timesteps from 29th Feb but when I calculate daily climatologies my time axis has again 366 values instead of 365. 'Var1_updated' has 20 time steps less compared to the main array and my data has a time range of 2000-2018. To calculate daily climatologies, I'm using daily_clim = Var1_updated.groupby('time.dayofyear').mean(dim='time'). It gives me . Then I thought that the values on 'dayofyear' = 60 (Leap Day) should be Nan array but I'm surprised to see that it's not so. – Light_B Nov 20 '18 at 11:49
  • I found out a way around by manually removing leap day climatology value from the calculated climatology. I will post it as a separate answer. But, maybe it's worthwhile to look at why daily climatology is still having some values even after dropping from original data. If I'm not making it any trivial error in understanding then should I open it as an issue on GitHub? – Light_B Nov 21 '18 at 10:28
  • 1
    Note that the `dayofyear` attribute represents the "ordinal day" which in pandas is defined as "days since December 31st the preceding year." Therefore all years will contain a date that has an ordinal day of 60; in a non-leap year this date will be March 1st, while in a leap year this date will be February 29th. If I understand your intended use-case correctly (daily climatologies, i.e. grouping by "matching month and day number") I think you might be interested in the discussion in [this GitHub issue](https://github.com/pydata/xarray/issues/1844#issuecomment-417855365). – spencerkclark Nov 21 '18 at 13:01
  • @spencerkclark Thanks Spencer, that's exactly what I'm trying to do. I noticed that both of my approaches, the one in the original question and the other in the comment is completely wrong but, the thread helped me to understand how grouping is happening. Your solution is pretty helpful. However, I was trying this [pandas](https://stackoverflow.com/a/20974207/7763326). Would something like that be possible in the future versions of xarray? – Light_B Nov 21 '18 at 17:50
  • 1
    That solution in pandas is very nice. I think something like that would be made possible with the addition of multi-argument groupby in xarray (see some initial work toward that [here](https://github.com/pydata/xarray/pull/924)). Progress on that has been delayed some due to a [broader re-envisioning of MultiIndex support](https://github.com/pydata/xarray/issues/1603), but for sure it is on the radar. – spencerkclark Nov 22 '18 at 17:42
0

You can convert your calendar to a non_leap one using xarray's convert_calendar. That is ds_N_R.convert_calendar('noleap').

Per the xarray documentation (https://docs.xarray.dev/en/stable/generated/xarray.Dataset.convert_calendar.html): "If the source and target calendars are either no_leap, all_leap or a standard type, only the type of the time array is modified. When converting to a leap year from a non-leap year, the 29th of February is removed from the array."

user39360
  • 1
  • 1
  • This is only a recently added feature by the way, not at the time of asking the question. The approach shoyer showed is customizable to a variety of other selection problems. – Light_B Dec 20 '22 at 10:52