I have a dataframe with +/- 300000 observations of 11 variables. A snapshot is given of the ones that I would like to use.
Location.type Package.ID Version WeekID Name
Office 301502 3.0 201542 William
Office 301502 2.7 201542 Claire
Production 9764933 1.6 201214 John
Home 298793 2.6 201746 Bill
Home 298793 2.5 201738 William
Production 2803789 4.2 201605 Brad
Production 2803789 4.19 201605 Richard
Production 2803789 4.18 201605 Vanessa
I want to omit the rows that have both a duplicated Package.ID and WeekID and keep the row with the highest value in Version, but keep all other information. My desired output is:
Location.type Package.ID Version WeekID Name
Office 301502 3.0 201542 William
Production 9764933 1.6 201214 John
Home 298793 2.6 201746 Bill
Home 298793 2.5 201738 William
Production 2803789 4.2 201605 Brad
My question is similar to Remove duplicates with largest absolute value. However, in that case picking the highest value depends on one column, in my case on two. Maybe this is a simple adjustment, but I could not figure it out myself.