1

How do I first sort and then select the first row by group with data.table?

Originally the procedure was implement using dplry

dat_dplyr <- dat %>% group_by(V1, V2) %>% arrange(V1, V2, desc(V3), desc(V4)) %>% filter(row_number() == 1)

This works, but a bit slow. So what would be the data.table equivalent? Would it be

DT <- as.data.table(dat)
test <- DT[order(-V3,-V4), .SD[1], by = .(V1, V2)]

Thanks very much for your help!

Frank
  • 66,179
  • 8
  • 96
  • 180
TCL
  • 29
  • 4
  • 1
    Hm, I don't know why that's slow. `.SD[1]` is optimized to be pretty fast, I think. You could instead try `unique(DT[order(-V3,-V4)], by=c("V1","V2"))`. Similarly, you could use just use `arrange` and `distinct` with dplyr. It will probably depend on your data, but David's "self-join" here may be faster: https://stackoverflow.com/a/41838383/ – Frank Oct 12 '17 at 21:37

0 Answers0