0

I have a dataframe with a column of ID's where some occur multiple times, a column with dates, and more random columns. I want to select the first occurence of the ID based on the date, and also keep the remaining columns in the data set.

ID = c("1", "2", "3", "3", "3", "4")
date = c("2019-10-06", "2019-08-29", "2019-08-09", "2019-02-01", "2019-11-17", "2019-05-24")
filler = c(1:6)
df <- data.frame(ID, date, filler)

df$date <- as.Date(df$date)
dfunique <- df %>% group_by(ID) %>% summarise(min_date = min(date))

I have tried with the summarise function and end up selecting the correct rows, but excluding the filler column. I have also tried, the distinct function which keeps all columns, but it choses the wrong rows.

dfunique2 <- df %>% distinct(ID, .keep_all = TRUE)

I hope to get a dataframe like

|ID  | date     | filler|
|:--:|:--------:|:-----:|
| 1  |2019-10-06|  1    |
| 2  |2019-08-29|  2    |
| 3  |2019-02-01|  4    |
| 4  |2019-05-24|  6    |

How can I include the remaining column(s) of my dataframe while selecting the correct rows? Thanks.

Troels
  • 31
  • 4

1 Answers1

0
df$date <- as.Date(df$date)
df %>% group_by(ID) %>%
  filter(date == min(date))

# A tibble: 4 x 3
# Groups:   ID [4]
  ID    date       filler
  <chr> <date>      <int>
1 1     2019-10-06      1
2 2     2019-08-29      2
3 3     2019-02-01      4
4 4     2019-05-24      6
AnilGoyal
  • 25,297
  • 4
  • 27
  • 45