Context:
I have a data frame similar to this, except that it extends over decades of data:
df = pd.DataFrame({'time':['2003-02-02', '2003-02-03', '2003-02-04', '2003-02-05', '2003-02-06', '2003-02-07', '2003-02-08', '2003-02-09','2003-02-10', '2003-02-11'], 'NDVI': [0.505413, 0.504566, 0.503682, 0.502759, 0.501796, 0.500791, 0.499743, 0.498651, 0.497514, 0.496332]})
df['time'] = pd.to_datetime(df['time'], format='%Y-%m-%d')
df.set_index('time', inplace=True)
Output:
NDVI
time
2003-02-02 0.505413
2003-02-03 0.504566
2003-02-04 0.503682
2003-02-05 0.502759
2003-02-06 0.501796
2003-02-07 0.500791
2003-02-08 0.499743
2003-02-09 0.498651
2003-02-10 0.497514
2003-02-11 0.496332
Problem:
I would like to:
- Get the mean
NDVI
value at a custom time interval that starts from the beginning of every year. If the interval is e.g. 10 days, values will be binned as [Jan-1 : Jan-10], [Jan-11 : Jan-20] etc. The last interval of the year will have to be either a 5- or 6-day interval depending on being a leap year (i.e. 360th-365/6th day of the year). - Add a column for the corresponding interval number, so the output would be something similar to this:
NDVI yr_interval
time
2003-01-31 0.505413 4
2003-02-10 0.497514 5
In the above example, the first line represents the 4th 10-day interval of year 2003.
Question:
How to implement that, knowing that:
- For time series spanning several years, the interval number should restart at every year (a similar behaviour to
pandas.Series.dt.week
)? - That the code should be flexible enough to test other time intervals (e.g. 8 days)?