I'm having trouble figuring out how to resample a pandas date-time indexed dataframe, but require a minimum number of values in order to give a value. I'd like to resample daily data to monthly, and require at least 90% of values to be present to yield a value.
With an input of daily data:
import pandas as pd
rng = pd.date_range('1/1/2011', periods=365, freq='D')
ts = pd.Series(pd.np.random.randn(len(rng)), index=rng)
ts['2011-01-01':'2011-01-05']=pd.np.nan #a short length of NANs to timeseries
ts['2011-10-03':'2011-10-30']=pd.np.nan #add ~ month long length of NANs to timeseries
that has only a few NANs in January, but almost a full month of NANs in October, I'd like the output of my monthly resampling sum:
ts.resample('M').sum()
to give a NAN for october (> 90% of daily data missing), and value for January (< 90% of data missing), instead of the current output:
2011-01-31 11.949479
2011-02-28 -1.730698
2011-03-31 -0.141164
2011-04-30 -0.291702
2011-05-31 -1.996223
2011-06-30 -1.936878
2011-07-31 5.025407
2011-08-31 -1.344950
2011-09-30 -2.035502
2011-10-31 -2.571338
2011-11-30 -13.492956
2011-12-31 7.100770
I've read this post, using rolling mean and min_periods; I'd prefer to keep using resample for its direct time-indexing use. Is this possible? I have not been able to find much in the resample docs or stack overflow to address this.