I've been pulling my hair for this specific question for an hour.
I have a table with person-id, application-date, event-date, and event-type. I would like to check, for a person and application-date, how many events of a specific type that preceded the application-date. Where it gets hard for me is since the constraint on application-date is 'fixed'. Meaning that I want to check, for each combination of person-id and application-date, how many events that preceded it, and don't want the constraint to be applied on row-level. E.g. if the person-id is A, and application-date is 2022-01-01, I want to calculate number of events for all rows that are before 2022-01-01, and I don't want the condition to change to another date (e.g. 2021-01-01) depending on row.
I wrote a loop that is horribly inefficient and doesn't scale well.
Example data:
import pandas as pd
df = pd.DataFrame({'person_id': ['A', 'A', 'A','B'],
'applied_on': pd.to_datetime(["2020-01-01", "2020-01-01","2021-01-01","2021-01-01"]),
'event_date': pd.to_datetime(["2020-03-01", "2020-09-01","2021-03-01","2020-03-01"]),
'event_indicator': [1, 0, 0, 1]}) #event_indicator indicates whether an event is a certain type
My implementation:
unique = df[['person_id','applied_on']].drop_duplicates().reset_index()
result =[]
for index, row in unique[['person_id','applied_on']].iterrows():
person = row['person_id']
date = row['applied_on']
loc = (df['person_id'] == person) & (df.event_date < date)
numEvent = df.event_indicator[loc].sum()
result.append([index, person, date, numEvent])
Appreciate any help here for making it better!