0

I have a dataset where information has been collected for each ID multiple times. I would like to subset and remove the earliest date. For example:

ID Timestamp Sex Income
AC1 2015-08-25
AC1 2016-10-05
AC1 2016-12-04

To this:

ID Timestamp Sex Income
AC1 2016-10-05
AC1 2016-12-04

Any ideas to do this on R?

Phil
  • 7,287
  • 3
  • 36
  • 66
n23
  • 89
  • 10

2 Answers2

2

I would use dplyr to remove the earliest date for each group. I'm providing some data here.

library(dplyr)

df <- structure(list(ID = c(1, 1, 1, 2, 2, 2), time = structure(c(1325485800, 
1325487600, 1325489400, 1325491200, 1325493000, 1325494800), class = c("POSIXct", 
"POSIXt"), tzone = "")), class = "data.frame", row.names = c(NA, 
-6L))

df.updated <- df %>% 
  dplyr::group_by(ID) %>% 
  dplyr::slice(-which.min(time)) 

Be sure to provide data when asking a question to give a good reproducible example. You can do this through dput(head(df)) to provide some of your data (as usually it only takes a little data to solve an issue).

AndrewGB
  • 16,126
  • 5
  • 18
  • 49
  • Thank you! This worked really well! How might I change the code if I decided I wanted to remove all times but the two most recent? – n23 Jun 27 '21 at 02:37
  • 1
    df.updated<-df %>% arrange(ID,timestamp) %>% group_by(ID) %>% slice(2) %>% ungroup() – Joe Erinjeri Jun 27 '21 at 11:16
1

...and another slightly less elegant variation is the following, which uses another dplyr verb: arrange. You could do something like

library(dplyr)

df_new<-df %>%
    arrange(ID,timestamp) %>%
    group_by(ID) %>%
    slice(-1) %>%
    ungroup()

The slice (-1) indicates to remove the first row by group, and since it was already arranged, you will have what you need!

AndrewGB
  • 16,126
  • 5
  • 18
  • 49
Joe Erinjeri
  • 1,200
  • 1
  • 7
  • 15