I want to resample()
my daily data into six-month chunks. However, I want the ends of the six-month chunks to be the ends of April and October. If I use df.resample('6M').sum()
(or df.groupby(pd.Grouper(freq='6M').sum()
), the end of the first six-month chunk is the end of the first month in the data. I know about anchored offsets, but I do not know how to create a custom anchored offset (e.g., '6M-APR'
does not work).
Here is some example code:
import pandas as pd
import numpy as np
np.random.seed(42)
df = pd.DataFrame(
data={'logret': np.random.randn(1000)},
index=pd.date_range(start='2001-05-25', periods=1000, freq='B')
)
df.resample('6M').sum()
Which yields the following output:
logret
2001-05-31 2.2950148716254297
2001-11-30 -12.536360930670858
2002-05-31 5.468848462868161
2002-11-30 13.027927629740189
2003-05-31 -10.37282118563155
2003-11-30 -0.156275418330286
2004-05-31 -3.0768727498370905
2004-11-30 28.328856464071546
2005-05-31 -3.6462613215100546
I have not achieved my goal (six-month resampling that ends in April and October) with the start
, offset
, and loffset
arguments to .resample()
.
I have achieved my goal with the hack below. However, it loses the date index, and I would like a more robust/repeatable approach.
def sixmonth(d, b=4):
y, m, h = d.year, d.month, 1
if (m > (b + 6)): y += 1
elif (m > b): h += 1
return y + h/10
df.groupby(sixmonth).sum()
Which yields the following output without a date:
logret
2001.2 -10.300839024148
2002.1 9.321994034984547
2002.2 8.855517878860585
2003.1 -2.4576797445001493
2003.2 -7.002919570231796
2004.1 -9.36895555474087
2004.2 27.13038641177464
2005.1 3.154551390326532
Of course, I could improve this hack. But is there a better/robust/repeatable solution for n-period resampling that ends in arbitrary months?