-3

I have the following dataset:

df

email_id           date
xyz@gmail.com   23-12-2018 21:33
xyz@gmail.com   23-12-2018 21:34
xyz@gmail.com   23-12-2018 21:35
xyz@gmail.com   23-12-2018 21:36
xyz@gmail.com   23-12-2018 21:37
abc@yahoo.com   23-12-2018 21:09
abc@yahoo.com   23-12-2018 21:10
abc@yahoo.com   23-12-2018 21:11
abc@yahoo.com   23-12-2018 21:12
abc@yahoo.com   23-12-2018 21:13
lmn@outlook.com 23-12-2018 21:44
lmn@outlook.com 23-12-2018 21:45
lmn@outlook.com 23-12-2018 21:46
lmn@outlook.com 23-12-2018 21:47

I am trying to find unique emails having the latest time stamp. The output is something as shown below:

email_id    date
xyz@gmail.com   23-12-2018 21:37
abc@yahoo.com   23-12-2018 21:13
lmn@outlook.com 23-12-2018 21:47

Can this be done using dplyr or should i trying some sql group by query? Help needed.

Nishant
  • 1,063
  • 13
  • 40

1 Answers1

1

Using data.table:

DT[, date := as.POSIXct(date, "%d-%m-%Y %H:%M", tz = "")]
DT[, .SD[which.max(date)], email_id]
          email_id                date
1:   xyz@gmail.com 2018-12-23 21:37:00
2:   abc@yahoo.com 2018-12-23 21:13:00
3: lmn@outlook.com 2018-12-23 21:47:00

Where:

DT <- fread("email_id,           date
xyz@gmail.com,   23-12-2018 21:33
xyz@gmail.com,   23-12-2018 21:34
xyz@gmail.com,   23-12-2018 21:35
xyz@gmail.com,   23-12-2018 21:36
xyz@gmail.com,   23-12-2018 21:37
abc@yahoo.com,   23-12-2018 21:09
abc@yahoo.com,   23-12-2018 21:10
abc@yahoo.com,   23-12-2018 21:11
abc@yahoo.com,   23-12-2018 21:12
abc@yahoo.com,   23-12-2018 21:13
lmn@outlook.com, 23-12-2018 21:44
lmn@outlook.com, 23-12-2018 21:45
lmn@outlook.com, 23-12-2018 21:46
lmn@outlook.com, 23-12-2018 21:47")
s_baldur
  • 29,441
  • 4
  • 36
  • 69