1

I have a dataframe in R like this:

id  year othercolumns
1   2017 ...
2   2017 ...
1   2018 ...
2   2018 ...
3   2018 ...
4   2018 ...
1   2019 ...
2   2019 ...
3   2019 ...
4   2019 ...
5   2019 ...

I need to select unique values for id, but only the record of the first year in which it appears remains. The result I need is this.

id year othercolumns
1  2017 ...
2  2017 ...
3  2018 ...
4  2018 ...
5  2019 ...

My data can have any start year, but the end will always be 2020.

dvera
  • 314
  • 1
  • 10
  • `data[!duplicated(data$id), ]`. Or `data %>% filter(!duplicated(id))`. Keeping the first record is the default behavior of `duplicated`. – Gregor Thomas May 09 '22 at 23:56

1 Answers1

1

Using dplyr,

df <- data.frame(
  id= c(1,2,1,2,3,4,1,2,3,4,5),
  year = c(2017,2017,2018,2018,2018,2018,2019,2019,2019,2019,2019)
)
require(dplyr)

df %>% 
  group_by(id) %>% 
  summarise(year = first(year))
#> # A tibble: 5 × 2
#>      id  year
#>   <dbl> <dbl>
#> 1     1  2017
#> 2     2  2017
#> 3     3  2018
#> 4     4  2018
#> 5     5  2019

Created on 2022-05-10 by the reprex package (v2.0.1)

YH Jang
  • 1,306
  • 5
  • 15