data.table selective non-match treatment during join: LOCF roll some columns and `na.fill` different columns

Question

Let's have one irregular time series table, e.g. buy/sell transactions and resulting positions on stocks (here Apple):

library(data.table)
txns = data.table(Instrument=rep("AAPL", 3), 
                  Date=as.Date(c("2014-05-10","2014-05-15", "2014-05-20")),
                  Txn.Qty=c(100, -20, -30), # 
                  key=c("Instrument", "Date"))
txns[, Pos.Qty:=cumsum(Txn.Qty), by="Instrument"]
txns
#    Instrument       Date Txn.Qty Pos.Qty
# 1:       AAPL 2014-05-10     100     100
# 2:       AAPL 2014-05-15     -20      80
# 3:       AAPL 2014-05-20     -30      50

Then we want to track positions (and possibly daily valuation) in a second, mostly regular time-series table:

positions = data.table(Instrument=rep("AAPL", 13), 
                       Date=as.Date(setdiff(as.Date("2014-05-07") + 1:15, 
                                    txns$Date[1:2])), # gaps are possible
                       key=c("Instrument", "Date"))

out = merge(positions, txns, all=T)
out[!txns, Txn.Qty:=0]
out[, Pos.Qty:=zoo::na.locf(Pos.Qty, na.rm=F), by=Instrument] # `zoo` dependency
out
#    Instrument       Date Txn.Qty Pos.Qty
#  1:       AAPL 2014-05-08       0      NA
#  2:       AAPL 2014-05-09       0      NA
#  3:       AAPL 2014-05-10     100     100
#  4:       AAPL 2014-05-11       0     100
#  5:       AAPL 2014-05-12       0     100
#  6:       AAPL 2014-05-13       0     100
#  7:       AAPL 2014-05-14       0     100
#  8:       AAPL 2014-05-15     -20      80
#  9:       AAPL 2014-05-16       0      80
# 10:       AAPL 2014-05-17       0      80
# 11:       AAPL 2014-05-18       0      80
# 12:       AAPL 2014-05-19       0      80
# 13:       AAPL 2014-05-20     -30      50
# 14:       AAPL 2014-05-21       0      50
# 15:       AAPL 2014-05-22       0      50

I was hoping to speed up the above (i.e. roll last observation forward in some columns, fill non-joins with zeros for other columns) with a single, efficient join and roll argument, performing selectively only on some of the columns (similar to .SDcols philosophy).

Is it possible to achieve something like this using data.table's roll argument on selected columns only?

Also, additional argument fill in [.data.table would help, equivalent to na.fill (we can think about roll corresponding to na.locf)

Possible duplicate of http://stackoverflow.com/questions/26171958/fill-in-missing-values-by-group-in-data-table? — GSee, Oct 04 '14 at 23:50
@GSee thanks for link. It is related, however different situation: this Q is about handling _non-matches during the join of two tables_ (fill non-matches with zero?, mark them as NA? roll locf?, omit them?) the linked Q in contrast handles _NAs in a single table_, where those NAs might or might not be a result of previous non-match during join. — Daniel Krizian, Oct 05 '14 at 14:13
Edited the title to better describe the problem. Perhaps more related to `nomatch` argument in `[.data.table` — Daniel Krizian, Oct 05 '14 at 14:28
Thanks for filing it as a FR. Seems interesting, but also probably hard ;-). — Arun, Oct 08 '14 at 16:26

data.table selective non-match treatment during join: LOCF roll some columns and `na.fill` different columns

0 Answers0