Based on this thread: Pandas Subset of a Time Series Without Resampling
The goal is to return the latest date in a month (with a value), and return that value.
Sample code:
Date CumReturn
3/31/2017 1
4/3/2017 .99
5/31/2017 1.022
4/4/2017 100
4/28/2017 1.012
5/1/2017 1.011
6/30/2017 1.033
import pandas as pd
df = pd.read_clipboard(parse_dates = ['Date'])
df.set_index('Date')
df
I thought this would work:
df.groupby(pd.Grouper(freq = 'M')).max()
But it returns the dates corresponding to the highest values (CumReturn
), rather than the max dates in the index.
df.groupby(pd.Grouper(freq = 'M')).last()
However, the output shows that the last day in April is chosen, rather than the latest day in the df. pandas assigns the value from April 28 to April 30, and returns this df:
CumReturn
Date
2017-03-31 1.000
2017-04-30 1.012
2017-05-31 1.022
2017-06-30 1.033
What causes this behavior? I assume pandas is just picking the latest date in each month, but that seems odd since those dates aren't present in the original data.