I have a data.table
whose rows consist of id, month, date and value. I would like to select the last observation for each (id,month) pair. Currently I do the following which is slow. Ideas for a better implementation?
require(data.table)
d = data.table(id = rep(1:3, each = 12*3), mo = rep(1:12,each=3), day = 1:3, key=c("id", "mo", "day")) #this is a data table with 3 ids, 12 months and 3 days per month
system.time(rslt <- d[,tail(.SD,1),by=list(id, mo)])
# user system elapsed
# 0.019 0.000 0.020
There should be a better way to do this. NOTE: the date at the end of the month could vary by id so i can't simply do setkey(d, day); d[J(3)]
. One potential solution I was thinking of is extracting all the (id, mo)
pairs and doing:
setkey(d, id, mo)
p = unique(d)[,day:=NULL]
setkey(p, id, mo)
system.time(rslt2 <- d[p, mult="last"])
# user system elapsed
# 0.001 0.000 0.001
identical(rslt2, rslt)
# [1] TRUE