This is an extension of an earlier question I asked: For a given month, find whether there is at least one 8 day period where 7 of those 8 days are marked TRUE (python)
The purpose of my code is to recognize whether there is an 8 day time window with at least 7 "TRUE" day values within it. The following code works most of the time. However, in this case it is returning False because the user has no data on the last day of the 8 day window (2018-01-08) so the script does not recognize that there is an 8 day window at all, even though they technically do have 7 True values within an 8 day window:
import pandas as pd
ids = 1
req_met = ['TRUE', 'TRUE', 'TRUE', 'TRUE', 'TRUE', 'TRUE', 'TRUE']
date = ['2018-01-01', '2018-01-02', '2018-01-03', '2018-01-04', '2018-01-05', '2018-01-06', '2018-01-07']
df = pd.DataFrame({'id':ids,
'Date':date,
'req_met':req_met})
print(df)
df['req_met'] = df['req_met'].replace({'TRUE':True, 'FALSE':False})
df['Date'] = pd.to_datetime(df.Date)
df = (df.set_index('Date')
.groupby('id').req_met
.resample('D').sum()
.reset_index()
)
df['Month'] = df.Date.dt.strftime("%Y-%m")
df_result = (df.groupby(['id','Month'])
.rolling(8)['req_met'].sum().ge(7)
.groupby(['id','Month'])
.agg({('req_met','max')})
.reset_index()
)
print(df_result)
id Month req_met
0 1 2018-01 False
One way to fix this would be to extend the end date of the resample to include one extra day. However, the solutions I've tried such as this one: pandas- changing the start and end date of resampled timeseries are returning errors, I think because of all the grouping.
Any assistance much appreciated.