data.table
implements asof
(also know as rolling
or LOCF
) joins out of the box. I've found this related question :
Filling in missing (blanks) in a data table, per category - backwards and forwards
but that question has NAs in the data. In my case I'm following the advice there to keep the data irregular and join to it using roll=TRUE
. What I'd like to do instead of the last observation carried forward, is the next observation to be carried backward, as efficiently as possible.
This is what I've tried, using time:=-time
first to try and trick it. Can I do it better? Can I do it faster?
llorJoin <- function(A,B){
B <- copy(B);
keys <- key(A);
if( !identical(key(A), key(B)) | is.null(keys) ){
stop("llorJoin::ERROR; A and B should have the same non-empty keys");
}
lastKey <- tail(keys,1L);
myStr <- parse(text=paste0(lastKey,":=-as.numeric(",lastKey,")"));
A <- A[,eval(myStr)]; setkeyv(A,keys);
B <- B[,eval(myStr)]; setkeyv(B,keys);
origin <- "1970-01-01 00:00.00 UTC";
A <- B[A,roll=T];
myStr2 <- parse(text=paste0(lastKey,":=as.POSIXct(-",lastKey,",origin=origin)"));
A <- A[,eval(myStr2)]; setkeyv(A,keys);
return(A);
}
library(data.table)
A <- data.table(time=as.POSIXct(c("10:01:01","10:01:02","10:01:04","10:01:05","10:01:02","10:01:01","10:01:01"),format="%H:%M:%S"),
b=c("a","a","a","a","b","c","c"),
d=c(1,1.9,2,1.8,5,4.1,4.2));
B <- data.table(time=as.POSIXct(c("10:01:01","10:01:03","10:01:00","10:01:01"),format="%H:%M:%S"),b=c("a","a","c","d"), e=c(1L,2L,3L,4L));
setkey(A,b,time)
setkey(B,b,time)
library(rbenchmark)
benchmark(llorJoin(A,B),B[A,roll=T],replications=10)
test replications elapsed relative user.self sys.self user.child sys.child
1 llorJoin(A, B) 10 0.045 1 0.048 0 0 0
2 B[A, roll = T] 10 0.009 1 0.008 0 0 0
b time e d
1: a 2013-01-12 09:01:01 1 1.0
2: a 2013-01-12 09:01:02 2 1.9
3: a 2013-01-12 09:01:04 NA 2.0
4: a 2013-01-12 09:01:05 NA 1.8
5: b 2013-01-12 09:01:02 NA 5.0
6: c 2013-01-12 09:01:01 NA 4.1
7: c 2013-01-12 09:01:01 NA 4.2
So as a comparaison, asof join on the initial data is 5X faster.