0

This is an extension of an earlier question I asked: For a given month, find whether there is at least one 8 day period where 7 of those 8 days are marked TRUE (python)

The purpose of my code is to recognize whether there is an 8 day time window with at least 7 "TRUE" day values within it. The following code works most of the time. However, in this case it is returning False because the user has no data on the last day of the 8 day window (2018-01-08) so the script does not recognize that there is an 8 day window at all, even though they technically do have 7 True values within an 8 day window:

import pandas as pd

ids = 1
req_met = ['TRUE', 'TRUE', 'TRUE', 'TRUE', 'TRUE', 'TRUE', 'TRUE']
date = ['2018-01-01', '2018-01-02', '2018-01-03', '2018-01-04', '2018-01-05', '2018-01-06', '2018-01-07']

df = pd.DataFrame({'id':ids,
                   'Date':date,
                   'req_met':req_met})
print(df)

df['req_met'] = df['req_met'].replace({'TRUE':True, 'FALSE':False})
df['Date'] = pd.to_datetime(df.Date)

df = (df.set_index('Date')
        .groupby('id').req_met
        .resample('D').sum()
        .reset_index()
        )

df['Month'] = df.Date.dt.strftime("%Y-%m")

df_result = (df.groupby(['id','Month'])
             .rolling(8)['req_met'].sum().ge(7)
             .groupby(['id','Month'])
             .agg({('req_met','max')})
             .reset_index()
             )

print(df_result)

   id    Month  req_met
0   1  2018-01    False

One way to fix this would be to extend the end date of the resample to include one extra day. However, the solutions I've tried such as this one: pandas- changing the start and end date of resampled timeseries are returning errors, I think because of all the grouping.

Any assistance much appreciated.

Dr Wampa
  • 443
  • 3
  • 8

1 Answers1

1

Try adding min_periods to your rolling.

df_result = (df.groupby(['id','Month'])
         .rolling(8, min_periods=7)['req_met'].sum().ge(7)
         .groupby(['id','Month'])
         .agg({('req_met','max')})
         .reset_index()
         )

min_periods will allow you to specify minimum number of observations in window required to have a value. You are looking for observation of minimum = 7, so in your case, min_periods = 7.

Emma
  • 8,518
  • 1
  • 18
  • 35