0

I have a dataset with these columns and values

 ID    Date       Weight    Pct
 1     11/01/2020 12.14     1.53
 1     12/14/2020 12.89     3.07
 2     05/27/2020 18.24     4.61
 3     16/19/2020 13.56     1.89
 4     07/05/2020 15.12     2.48
 4     08/15/2020 11.45     1.25

For rows with duplicate IDs I am interested in

  • retaining only those rows that are earliest (slice(which.min(Date))) if the corresponding Pct column value is minimum (slice(which.min(Pct)))
  • If the earliest rows has the largest value of Pct for that ID, then ignore the, row with earliest date criteria. Only retain the row with min Pct.

Expected Output

ID    Date       Weight    Pct
 1     11/01/2020 12.14     1.53
 
 2     05/27/2020 18.24     4.61
 3     16/19/2020 13.56     1.89
 
 4     08/15/2020 11.45     1.25

My question is , does checking for earliest date per ID become redundant and just reduce to checking if Pct is minimum of the duplicate IDs and retain only those rows with min of Pct per ID ?

Science11
  • 788
  • 1
  • 8
  • 24
  • 3
    First, you need to make sure you're dealing with dates; as is, you're dealing with strings, and `min(.)` of a string in that format will fail; use `as.Date(x$Date, format="%m/%d/%Y")` for conversion. Second, you're grouping by `ID`, see https://stackoverflow.com/q/11562656/3358272. Third, you mention min-`Pct` and min-`Date`; choose which you want *first*, then use the second for a tie-breaker. – r2evans Oct 08 '21 at 19:02
  • 1
    Date`16/19/2020` doesn't look correct. Is that a typo ? – Ronak Shah Oct 09 '21 at 05:27
  • @RonakShah, yes you are correct. – Science11 Oct 09 '21 at 05:33
  • To your question, if your Pct column is unique per ID, i.e if you can wnsure that the same ID doesn‘t have the same Pct value twice, then yes, you could just take the minimum of Pct and ignore the Date. – deschen Oct 09 '21 at 06:05

0 Answers0