0

I would like to take this dataset and remove the values if they have the same id and age(duplicates) but keep the one with the highest month number.

ID|Age|Month|
1   25   7
1   25  12
2   18  10
2   18  11
3   12  10
3   25  10
4   19  10
5   10  2
5   10  3

And have the outcome be

ID|Age|Month
 1   25   12
 2   18   11
 3   12   10
 3   25   10
 4   19   10
 5   10    3

Note that it removed the duplicates but kept the version with the highest month number.

user35131
  • 1,105
  • 6
  • 18

2 Answers2

1

as a solution option

library(tidyverse)
df <- read.table(text = "ID Age Month 
1   25   7
1   25  12
2   18  10
2   18  11
3   12  10
3   25  10
4   19  10
5   10  2
5   10  3", header = T)

df %>% 
  group_by(ID, Age) %>% 
  slice_max(Month)
#> # A tibble: 6 x 3
#> # Groups:   ID, Age [6]
#>      ID   Age Month
#>   <int> <int> <int>
#> 1     1    25    12
#> 2     2    18    11
#> 3     3    12    10
#> 4     3    25    10
#> 5     4    19    10
#> 6     5    10     3

Created on 2021-02-11 by the reprex package (v1.0.0)

Yuriy Saraykin
  • 8,390
  • 1
  • 7
  • 14
0

Using dplyr package, the solution:

df %>% 
+   group_by(ID, Age) %>%
+   filter(Month == max(Month))
# A tibble: 6 x 3
# Groups:   ID, Age [6]
     ID   Age Month
  <dbl> <dbl> <dbl>
1     1    25    12
2     2    18    11
3     3    12    10
4     3    25    10
5     4    19    10
6     5    10     3
Terru_theTerror
  • 4,918
  • 2
  • 20
  • 39