1

Edit: fake data for example

df = matrix(runif(50*507), nrow = 50, ncol = 507)
df = data.frame(df)
df[,1] = seq(as.Date("2017/1/1"), as.Date("2017/2/19"), "days")
names(df) = paste0("var", 1:507)
names(df)[505:507] = c("mktrf", "smb", "hml")
names(df)[1] = "Date"

All the dep var

x = df[,505:507]

All the indep var

y <- df[,2:504]

I have a function called shift I'd like to apply to every column of a df. The function lags variables. The function is as follows, and shifts the specified column(s) by a specified number.

shift<-function(x,shift_by){
  stopifnot(is.numeric(shift_by))
  stopifnot(is.numeric(x))

  if (length(shift_by)>1)
    return(sapply(shift_by,shift, x=x))

  out<-NULL
  abs_shift_by=abs(shift_by)
  if (shift_by > 0 )
    out<-c(tail(x,-abs_shift_by),rep(NA,abs_shift_by))
  else if (shift_by < 0 )
    out<-c(rep(NA,abs_shift_by), head(x,-abs_shift_by))
  else 
    out<-x
  out
}

When I use the sapply function like this, where y is a dataframe consisting of time series variables I want to lag:

y_lag <- sapply(y,shift,-1 )

I get the following error:

Error: cannot allocate vector of size 54.2 Mb
In addition: Warning messages:
1: In unlist(x, recursive = FALSE) :
  Reached total allocation of 8072Mb: see help(memory.size)
2: In unlist(x, recursive = FALSE) :
  Reached total allocation of 8072Mb: see help(memory.size)
3: In unlist(x, recursive = FALSE) :
  Reached total allocation of 8072Mb: see help(memory.size)
4: In unlist(x, recursive = FALSE) :
  Reached total allocation of 8072Mb: see help(memory.size)
5: In unlist(x, recursive = FALSE) :
  Reached total allocation of 8072Mb: see help(memory.size)
6: In unlist(x, recursive = FALSE) :
  Reached total allocation of 8072Mb: see help(memory.size)

My question: can I use a different method to lag every element of a column, while still using the lm package? Or how do I address the memory issue I am having? I can't use a different computer.

user6883405
  • 393
  • 3
  • 14
  • 1
    Why cannot you use `lag` function from `dplyr` package? I dont think you need to write a new function. – MKR Mar 01 '18 at 19:44
  • 1
    SInce `y` is a data.frame, should you instead do `lapply(y, shift, -1)` (or `sapply(...,simplify=FALSE)`)? Using `sapply`, it will try to convert it to a matrix, not sure if that's what you want. – r2evans Mar 01 '18 at 19:49
  • 1
    How big is `y`? BTW: it might make more sense to make `shift.data.frame` instead of `sapply(y,shift,-1)`, since you only need to it once, not once for all columns. (You can make that function even if you are not using S3 method dispatch ... which could be appropriate in this scenario.) – r2evans Mar 01 '18 at 19:51
  • 1
    One can help better if you provide question with data as mentioned in https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – MKR Mar 01 '18 at 20:02
  • Added some fake data that should help example. – user6883405 Mar 01 '18 at 20:46
  • 1
    In my actual example, y is approx. 14000 obs of 503 variables – user6883405 Mar 01 '18 at 20:47
  • You can use `mutate_at` instead of `sapply`. I have tried and it worked fine. See the answer below. – MKR Mar 01 '18 at 21:07

2 Answers2

0

I was able to get it to work using the lagpad function described in a different question here:

lagpad <- function(x, k=1) {
  i<-is.vector(x)
  if(is.vector(x)) x<-matrix(x) else x<-matrix(x,nrow(x))
  if(k>0) {
    x <- rbind(matrix(rep(NA, k*ncol(x)),ncol=ncol(x)), matrix(x[1:(nrow(x)-k),], ncol=ncol(x)))
  }
  else {
    x <- rbind(matrix(x[(-k+1):(nrow(x)),], ncol=ncol(x)),matrix(rep(NA, -k*ncol(x)),ncol=ncol(x)))
  }
  if(i) x[1:length(x)] else x
}

Which essentially does what r2evans described, shifting the whole df.

user6883405
  • 393
  • 3
  • 14
0

There are couple of options to avoid use of sapply in this case. The one option is to use mutate_all

library(dplyr)

y_lag <- mutate_all(y, shift, shift_by = -1)

tail(y_lag)
#var2      var3      var4      var5       var6      var7      var8       var9      var10
#45 0.26817677 0.9664805 0.2849259 0.6375189 0.20889115 0.1530204 0.6500325 0.78397715 0.32936124
# many more rows to follow
MKR
  • 19,739
  • 4
  • 23
  • 33