Below is a sample data frame:
df = pd.DataFrame({'StudentName': ['Anil','Ramu','Ramu','Anil','Peter','Peter','Anil','Ramu','Peter','Anil'],
'ExamDate': ['2021-01-10','2021-01-20','2021-02-22','2021-03-30','2021-01-04','2021-06-06','2021-04-30','2021-07-30','2021-07-08','2021-09-07'],
'Result': ['Fail','Pass','Fail','Pass','Pass','Pass','Pass','Pass','Fail','Pass']})
StudentName ExamDate Result
0 Anil 2021-01-10 Fail
1 Ramu 2021-01-20 Pass
2 Ramu 2021-02-22 Fail
3 Anil 2021-03-30 Pass
4 Peter 2021-01-04 Pass
5 Peter 2021-06-06 Pass
6 Anil 2021-04-30 Pass
7 Ramu 2021-07-30 Pass
8 Peter 2021-07-08 Fail
9 Anil 2021-09-07 Pass
For each row, I would like to calculate the number of days it has been since that student's last failed test:
df = pd.DataFrame({'StudentName': ['Anil','Ramu','Ramu','Anil','Peter','Peter','Anil','Ramu','Peter','Anil'],
'ExamDate': ['2021-01-10','2021-01-20','2021-02-22','2021-03-30','2021-01-04','2021-06-06','2021-04-30','2021-07-30','2021-07-08','2021-09-07'],
'Result': ['Fail','Pass','Fail','Pass','Pass','Pass','Pass','Pass','Fail','Pass'],
'LastFailedDays': [0, 0, 0, 79, 0, 0, 110, 158, 0, 240]})
StudentName ExamDate Result LastFailedDays
0 Anil 2021-01-10 Fail 0
1 Ramu 2021-01-20 Pass 0
2 Ramu 2021-02-22 Fail 0
3 Anil 2021-03-30 Pass 79
4 Peter 2021-01-04 Pass 0
5 Peter 2021-06-06 Pass 0
6 Anil 2021-04-30 Pass 110
7 Ramu 2021-07-30 Pass 158
8 Peter 2021-07-08 Fail 0
9 Anil 2021-09-07 Pass 240
For example:
- Anil failed on 2021-01-10, so for that row it will be zero days.
- Anil's next record, which is successful, is on 2021-03-30, so the number of days for that row will be the number of days from his previous failed date 2021-01-10 to 2021-03-30, which is 79 days.
- Anil's third record, which is also successful, is on 2021-04-30, so the number of days there will be again, the number of days 2021-01-10 (his last failed date) to 2021-04-30, which is 110 days.
It is doable with regular loops but I am looking for a more conventional Pandas solution. I'm guessing it's possible with groupby
.