Efficient Manipulation of Time Series in R Data Table Package

Question

Much appreciated for your comment/answer.

Context: I have a large data table of daily prices of swap rates across a dozen countries. The columns are [ID, Date, X1Y, X2Y, X3Y ... X30Y], where X..Y are columns indicating the part of the yield curve (e.g. X1Y is 1-year swap, X3Y is 3-year swap). The two keys are ID (e.g. "AUD", "GBP") and Date (e.g. "2001-04-13", "2001-04-16").

Dummy Data:

set.seed(123)
dt <- cbind(ID=rep(c("AUD","GBP"),c(100,100)),X1Y=rnorm(200),X2Y=rnorm(200),X3Y=rnorm(200))
dt <- data.table(dt)
dt[,Date := seq(from=as.IDate("2013-01-01"), by="1 day", length.out=100)]
setkeyv(dt,c("ID","Date"))

Problem 1: First generate some dummy signals. What's the syntax if there is 100 columns with fairly complicated signal generation formula coded in a separate function say genSig(X1Y)? Here's what I mean using just the 3 columns and some meaningless formula:

dt[,SIG1  :=c(0, diff(X1Y ,1)),by="ID"]
dt[,SIG2  :=c(0, diff(X2Y ,1)),by="ID"]
dt[,SIG3  :=c(0, diff(X3Y ,1)),by="ID"]

Problem 2: Carry forward column(s) based on "middle of the month". For example, using the SIG columns, I'd like to make everything after say the 15th of each month the same as the signal on the 15th, until next month's 15th. The tricky thing is that the actual data contains just trading days so some months do not have 15th if it is a weekend/holiday. Another issue is using an efficient syntax, I could achieve something similar using loop (I know..) for the start of each month just to show what I meant:

for (i in 2:length(dt$Date)){
  if(as.POSIXlt(dt[i,]$Date)$mon == as.POSIXlt(dt[i-1,]$Date)$mon){
    dt[i, SIG1 := dt[i-1,SIG1]] 
    dt[i, SIG2 := dt[i-1,SIG2]] 
    dt[i, SIG3 := dt[i-1,SIG3]] 
  }
}

I can't figure out how to deal with the "mid-month" issue since it can fall on the 15th or 16th or 17th. Like Problem 1, would appreciate if there is a smart way to insert/update multiple/dozen columns.

score 3 · Answer 1 · answered Nov 22 '13 at 17:25

As far as problem 2 goes, you can use rolling joins:

# small sample to demonstrate
dt = data.table(date = as.Date(c('2013-01-01', '2013-01-15', '2013-01-17', '2013-02-14', '2013-02-17'), '%Y-%m-%d'), val = 1:5)
dt
#         date val
#1: 2013-01-01   1
#2: 2013-01-15   2
#3: 2013-01-17   3
#4: 2013-02-14   4
#5: 2013-02-17   5

setkey(dt, date)
midmonth = seq(as.Date('2013-01-15', '%Y-%m-%d'),
               as.Date('2013-12-15', '%Y-%m-%d'),
               by = '1 month')

dt[, flag := 0]
dt[J(midmonth), flag := 1, roll = -Inf]
dt
#         date val flag
#1: 2013-01-01   1    0
#2: 2013-01-15   2    1
#3: 2013-01-17   3    0
#4: 2013-02-14   4    0
#5: 2013-02-17   5    1

And now you can cumsum the flag to obtain the grouping you want to e.g. do:

dt[, val1 := val[1], by = cumsum(flag)]
dt
#         date val flag val1
#1: 2013-01-01   1    0    1
#2: 2013-01-15   2    1    2
#3: 2013-01-17   3    0    2
#4: 2013-02-14   4    0    2
#5: 2013-02-17   5    1    5

score 1 · Accepted Answer · edited May 23 '17 at 12:19

# problem 1
nsig <- 3L
csig <- 1:nsig+1L
newcols <- paste('SIG',1:nsig,sep='')
dt[,(newcols):=0]
for (j in csig) set(dt,j=j+nsig+1L,value=c(0, diff(dt[[j]],1)))

After looking at @eddi's answer, I see that set is not so useful for problem 2. Here's what I would do:

dt[,(newcols):=lapply(newcols,function(x) get(x)[1]),by=list(ID,month(Date-14))]

According to this answer, you can subtract days from a date in this way.

Aside. Cbind-ing vectors makes a matrix. In your example, you've got a character matrix. I think you were looking for...

# Creating better data...
set.seed(123)
dt <- data.table(ID=rep(c("AUD","GBP"),c(100,100)),
  X1Y=rnorm(200),X2Y=rnorm(200),X3Y=rnorm(200),
  Date=seq(from=as.IDate("2013-01-01"), by="1 day", length.out=100))

Efficient Manipulation of Time Series in R Data Table Package

2 Answers2