have a data frame data as below
InsuranceId InsuranceStatus Date
0 Ins1234 DuePayment 2020-06-07 23:59:43.123456+00:00
1 Ins1234 Successful 2019-06-07 23:59:43.123456+00:00
2 Ins1234 Successful 2018-06-07 23:59:43.123456+00:00
3 Ins5678 DuePayment 2020-07-07 22:59:32.123421+00:00
4 Ins5678 Successful 2019-07-07 22:59:32.123421+00:00
5 Ins5678 Successful 2018-07-07 22:59:32.123421+00:00
trying to create a rownumber/rank on basis of group by InsuranceId and max(Date)
df['RowNum'] = df.groupby('InsuranceId')['InsuranceStatus']['Date'].rank(method="first", ascending=True)
and
df['RowNum'] = df.groupby(by=['InsuranceId'])['InsuranceStatus']['Date'].transform(lambda x: x.rank())
by referring SQL-like window functions in PANDAS: Row Numbering in Python Pandas Dataframe
Error: Index Error: Columns status already selected
trying to achieve below output
InsuranceId InsuranceStatus Date RowNum
0 Ins1234 DuePayment 2020-06-07 23:59:43.123456+00:00 1
1 Ins1234 Successful 2019-06-07 23:59:43.123456+00:00 2
2 Ins1234 Successful 2018-06-07 23:59:43.123456+00:00 3
3 Ins5678 DuePayment 2020-07-07 22:59:32.123421+00:00 1
4 Ins5678 Successful 2019-07-07 22:59:32.123421+00:00 2
5 Ins5678 Successful 2018-07-07 22:59:32.123421+00:00 3
Is there anything i am missing to add. pls any suggestions
Final output:
InsuranceId InsuranceStatus Date
Ins1234 DuePayment 2020-06-07 23:59:43.123456+00:00
Ins5678 DuePayment 2020-07-07 22:59:32.123421+00:00