Sort and then select the first row by group with data.table?

Asked Oct 12 '17 at 19:54

Active Oct 13 '17 at 12:19

Viewed 1,042 times

How do I first sort and then select the first row by group with data.table?

Originally the procedure was implement using dplry

dat_dplyr <- dat %>% group_by(V1, V2) %>% arrange(V1, V2, desc(V3), desc(V4)) %>% filter(row_number() == 1)

This works, but a bit slow. So what would be the data.table equivalent? Would it be

DT <- as.data.table(dat)
test <- DT[order(-V3,-V4), .SD[1], by = .(V1, V2)]

Thanks very much for your help!

edited Oct 12 '17 at 21:34

Frank

asked Oct 12 '17 at 19:54

TCL

1

Hm, I don't know why that's slow. `.SD[1]` is optimized to be pretty fast, I think. You could instead try `unique(DT[order(-V3,-V4)], by=c("V1","V2"))`. Similarly, you could use just use `arrange` and `distinct` with dplyr. It will probably depend on your data, but David's "self-join" here may be faster: https://stackoverflow.com/a/41838383/ – Frank Oct 12 '17 at 21:37

0 Answers0