I have a dataset with these columns and values
ID Date Weight Pct
1 11/01/2020 12.14 1.53
1 12/14/2020 12.89 3.07
2 05/27/2020 18.24 4.61
3 16/19/2020 13.56 1.89
4 07/05/2020 15.12 2.48
4 08/15/2020 11.45 1.25
For rows with duplicate IDs I am interested in
- retaining only those rows that are earliest
(slice(which.min(Date)))
if the corresponding Pct column value is minimum(slice(which.min(Pct)))
- If the earliest rows has the largest value of
Pct
for that ID, then ignore the, row with earliest date criteria. Only retain the row with min Pct.
Expected Output
ID Date Weight Pct
1 11/01/2020 12.14 1.53
2 05/27/2020 18.24 4.61
3 16/19/2020 13.56 1.89
4 08/15/2020 11.45 1.25
My question is , does checking for earliest date per ID become redundant and just reduce to checking if Pct is minimum of the duplicate IDs and retain only those rows with min of Pct per ID ?