patient_id | alert_id | alert_timestamp |
---|---|---|
3 | xyz | 2022-10-10 |
1 | anp | 2022-10-12 |
1 | gfe | 2022-10-10 |
2 | fgy | 2022-10-02 |
2 | gpl | 2022-10-03 |
1 | gdf | 2022-10-13 |
2 | mkd | 2022-10-23 |
1 | liu | 2022-10-01 |
I have a data frame (see simplified version above) where for each patient_id
, I want to only keep the latest alert (i.e. last one) that was sent out in a given window period e.g. window_size = 7
.
Note, the window size needs to look at consecutive days i.e. between day 1
-> day 1 + window_size
. The ranges of alert_timestamp
for each patient_id
varies and is usually well beyond the window_size
range.
Note, that the data frame example given above, is a very simple example and will have many more patient_id
's and will be in a mixed order in terms alert_timestamp
and alert_id
.
The approach is to start from the last alert_timstamp
for a given patient_id
and work back using the window_size
to select the alert that was the last one in that window time frame.
Please note the idea is to have a scanning/looking window, example window_size = 7 days
to move across the timestamps of each patient
The end result I want, is a data frame with the filtered out alerts
Expected output for (this example) window_size = 7
:
patient_id | alert_id | alert_timestamp |
---|---|---|
1 | liu | 2022-10-01 |
1 | gdf | 2022-10-13 |
2 | gpl | 2022-10-03 |
2 | mkd | 2022-10-23 |
3 | xyz | 2022-10-10 |
What's the most efficient way to solve for this?