I processing time-series data within a pandas dataframe. The datetime index is incomplete (i.e. some dates are missing).
I want to create a new column with a datetime series of 1 year offset, but only containg dates present in the original datetimeindex . The challenge is that the exact 1y match is not present in the index in many cases.
Index (Input) 1 year offset (Output)
1/2/2014 None
1/3/2014 None
1/6/2014 None
1/7/2014 None
1/9/2014 None
1/10/2014 None
1/2/2015 1/2/2014
1/5/2015 1/3/2014
1/6/2015 1/6/2014
1/7/2015 1/7/2014
1/8/2015 1/9/2014
1/9/2015 1/10/2014
The requirements are as follows:
- Every date as of 1/2/2015 must have a corresponding offset date (no blanks)
Every date within the "offset date" group must also be present in the Index column (i.e. introduction of new dates, like 1/8/2014, is not desired
All offset dates must be ordered in an ascending way (the sequence of dates must be preserved)
What I have tried so far:
- The Dateoffset doesn't help, since it is insensitive to dates not present in the index.
- The .shift method
data["1 year offset (Output)"] = data.Index.shift(365)
doesn't help because the number of dates within the index is different across the years.
What I am trying to do now has several steps:
- Apply Dateoffset method at first to create "temp 1 year offset"
Remove single dates from "temp 1 year offset" that are not present in datetimeindex using
set(list)
method and replace cells by NaNSelect dates in datetimeindex whose "temp 1 year offset" is NaN and substract one year
Map the Dates from (3) to its closest date in the datetimeindex using
argmin
The challenge here is that I am getting double entries as well as a descending order of days in some cases. Those mess up with the results in the following way (see the timedeltas between day n and day n+1):
Index (Input) 1 year offset (Output) Timedelta
4/17/2014 4/16/2014 1
4/22/2014 4/17/2014 1
4/23/2014 4/25/2014 8
4/24/2014 None
4/25/2014 4/22/2014 -3
4/28/2014 4/23/2014 1
4/29/2014 4/24/2014 1
4/30/2014 4/25/2014 1
In any case, this last approach seems to be an overkill concerning the simplicity of the underlying goal. Is there a faster and more simple way to do it?
How to group every date in an uneven pandas datetime series with the closest date one year ago in the same series?