I had a dataframe like below, and I would like to remove duplications based on certain criteria. 1) If the startdate is greater than Month, it will be removed. 2) If the startdate is less than Month, keep the latest record.
> COMP Month Startdate bundle result
> 0 TD3M 2018-03-01 2015-08-28 01_Essential keep
> 1 TD3M 2018-03-01 2018-07-17 04_Complete remove
> 2 TD3M 2018-04-01 2015-08-28 01_Essential keep
> 3 TD3M 2018-04-01 2018-07-17 04_Complete remove
> 4 TD3M 2018-05-01 2015-08-28 01_Essential keep
> 5 TD3M 2018-05-01 2018-07-17 04_Complete remove
> 6 TD3M 2018-06-01 2015-08-28 01_Essential keep
> 7 TD3M 2018-06-01 2018-07-17 04_Complete remove
> 8 TD3M 2018-08-01 2015-08-28 01_Essential remove
> 9 TD3M 2018-08-01 2018-07-17 04_Complete keep
> 10 TD3M 2018-09-01 2015-08-28 01_Essential remove
> 11 TD3M 2018-09-01 2018-07-17 04_Complete keep
The expected output would be:
> COMP Month Startdate bundle
> 0 TD3M 2018-03-01 2015-08-28 01_Essential
> 2 TD3M 2018-04-01 2015-08-28 01_Essential
> 4 TD3M 2018-05-01 2015-08-28 01_Essential
> 6 TD3M 2018-06-01 2015-08-28 01_Essential
> 9 TD3M 2018-08-01 2018-07-17 04_Complete
> 11 TD3M 2018-09-01 2018-07-17 04_Complete