Group by and select min date with data.table

Question

My Data

df1 <- structure(list(ID = c("A", "A", "A", "B", "B", "C"), c1 = 1:6, 
c2 = 1:6, myDate = c("01.01.2015", "02.02.2014", "03.01.2014", 
"09.09.2009", "10.10.2010", "06.06.2011")), .Names = c("ID", 
"c1", "c2", "myDate"), class = "data.frame", row.names = c(NA,-6L))

My desired output (note: A df, keeping all columns!):

ID    c1    c2    myDate
A     3     3     03.01.2014
B     4     4     09.09.2009
C     6     6     06.06.2011
....

My Code

library(data.table)
setDT(df1)
df1[,myDate:=as.Date(myDate, "%d.%m.%Y")]
test2 <- df1[,.(myDate == min(myDate)), by = ID]

That gives me in my corresponding column (myDate) a logical where the condition matches. But, thats not df and all the other columns get lost. I am fairly new to the data.table package so any help would be appreciated.

The answer to your other question sure looks sufficient: http://stackoverflow.com/a/33417481/1191259 — Frank, Oct 30 '15 at 13:29
A more exact duplicate of this question is http://stackoverflow.com/q/16573995/1191259 — Frank, Oct 30 '15 at 13:30
@Frank Thanks for the second link. The reason I posted a second question is because I did not understand the answer properly. — four-eyes, Oct 30 '15 at 13:35

akrun · Accepted Answer · 2015-10-30T13:09:11.677

12

We can use which.min to get the index and use .SD to get the Subset of Data.table.

setDT(df1)[, .SD[which.min(as.Date(myDate, '%d.%m.%Y'))], by = ID]
#   ID c1 c2     myDate
#1:  A  3  3 03.01.2014
#2:  B  4  4 09.09.2009
#3:  C  6  6 06.06.2011

Or if there are ties and we need all the min value rows, use ==

setDT(df1)[, {tmp <- as.Date(myDate, '%d.%m.%Y'); .SD[tmp==min(tmp)] }, ID]
#ID c1 c2     myDate
#1:  A  3  3 03.01.2014
#2:  B  4  4 09.09.2009
#3:  C  6  6 06.06.2011

Other option would be to get the row index (.I) and then subset. It would be fast

setDT(df1)[df1[, .I[which.min(as.Date(myDate, '%d.%m.%Y'))], ID]$V1]
# ID c1 c2     myDate
#1:  A  3  3 03.01.2014
#2:  B  4  4 09.09.2009
#3:  C  6  6 06.06.2011

edited Oct 30 '15 at 13:09

answered Oct 30 '15 at 13:03

akrun

874,273
37
540
662

Neat. Could you point me somewhere where I could read more about `I` and `SD` – four-eyes Oct 30 '15 at 13:29
@Chrissi Try [here](https://cran.r-project.org/web/packages/data.table/vignettes/datatable-intro.pdf) and [here](https://cran.r-project.org/web/packages/data.table/vignettes/datatable-faq.pdf) for understanding a bit more. – akrun Oct 30 '15 at 14:48
Hi @akrun, do you have recommendations for what to do if I needed to find the min amongst several columns? – Jantje Houten Jun 06 '21 at 17:57
@JantjeHouten Try `df1 %>% summarise(across(yourcolumns, min))` – akrun Jun 06 '21 at 18:56
@akrun thank you, and if I have a grouping, I assume ill do the grouping before summarise? Do you have a solution for data.table? I think maybe it involves using .SDcols? I currently have something like this: tmp = tmp[, min(.SD, na.rm = TRUE), by = .(DCOF_ID, PRESCRIBED_DATE, DRUG_NAME), .SDcols = c("DOSE_BREAKFAST", "DOSE_LUNCH", "DOSE_DINNER", "DOSE_SUPPER")] – Jantje Houten Jun 07 '21 at 05:24

Group by and select min date with data.table

1 Answers1

Linked