I have a question that extends from Pandas: conditional rolling count. I would like to create a new column in a dataframe that reflects the cumulative count of rows that meets several criteria.
Using the following example and code from stackoverflow 25119524
import pandas as pd
l1 =["1", "1", "1", "2", "2", "2", "2", "2"]
l2 =[1, 2, 2, 2, 2, 2, 2, 3]
l3 =[45, 25, 28, 70, 95, 98, 120, 80]
cowmast = pd.DataFrame(list(zip(l1, l2, l3)))
cowmast.columns =['Cow', 'Lact', 'DIM']
def rolling_count(val):
if val == rolling_count.previous:
rolling_count.count +=1
else:
rolling_count.previous = val
rolling_count.count = 1
return rolling_count.count
rolling_count.count = 0 #static variable
rolling_count.previous = None #static variable
cowmast['xmast'] = cowmast['Cow'].apply(rolling_count) #new column in dataframe
cowmast
The output is xmast (number of times mastitis) for each cow
Cow Lact DIM xmast 0 1 1 45 1 1 1 2 25 2 2 1 2 28 3 3 2 2 70 1 4 2 2 95 2 5 2 2 98 3 6 2 2 120 4 7 2 3 80 5
What I would like to do is restart the count for each cow (cow) lactation (Lact) and only increment the count when the number of days (DIM) between rows is more than 7.
To incorporate more than one condition to reset the count for each cows lactation (Lact) I used the following code.
def count_consecutive_items_n_cols(df, col_name_list, output_col):
cum_sum_list = [
(df[col_name] != df[col_name].shift(1)).cumsum().tolist() for col_name in col_name_list
]
df[output_col] = df.groupby(
["_".join(map(str, x)) for x in zip(*cum_sum_list)]
).cumcount() + 1
return df
count_consecutive_items_n_cols(cowmast, ['Cow', 'Lact'], ['Lxmast'])
That produces the following output
Cow Lact DIM xmast Lxmast 0 1 1 45 1 1 1 1 2 25 2 1 2 1 2 28 3 2 3 2 2 70 1 1 4 2 2 95 2 2 5 2 2 98 3 3 6 2 2 120 4 4 7 2 3 80 5 1
I would appreciate insight as to how to add another condition in the cumulative count that takes into consideration the time between mastitis events (difference in DIM between rows for cows within the same Lact). If the difference in DIM between rows for the same cow and lactation is less than 7 then the count should not increment.
The output I am looking for is called "Adjusted" in the table below.
Cow Lact DIM xmast Lxmast Adjusted 0 1 1 45 1 1 1 1 1 2 25 2 1 1 2 1 2 28 3 2 1 3 2 2 70 1 1 1 4 2 2 95 2 2 2 5 2 2 98 3 3 2 6 2 2 120 4 4 3 7 2 3 80 5 1 1
In the example above for cow 1 lact 2 the count is not incremented when the dim goes from 25 to 28 as the difference between the two events is less than 7 days. Same for cow 2 lact 2 when is goes from 95 to 98. For the larger increments 70 to 95 and 98 to 120 the count is increased.
Thank you for your help
John