5

I have a dataframe with a column t. I want to create n lagged columns that has names like t-1,t-2 etc..

  year      t  t-1 t-2
19620101    1   NA  NA
19630102    2   1   NA
19640103    3   2   1
19650104    4   3   2
19650104    5   4   3
19650104    6   5   4

My idea is that I will do it in four steps:

  • A loop for the column names using "paste"
  • A loop for the temporary dataframes for lagged columns using "paste"
  • A loop for creating the lagged columns
  • cbind them.

But I am not able to proceed with the code. Something rough:

df_final<-lagged(df="odd",n=3)

lagged<-function(df,n){
   df<-zoo(df)
   lags<-paste("A", 1:n, sep ="_")
   for (i in 1:5) {
     odd<-as.data.frame(lag(odd$OBS_Q,-1*i,na.pad =  TRUE))

   #Cbind here
   } 

I am stuck in writing this function. Could you please show some way? Or a different simpler way of doing this....

Reference: Basic lag in R vector/dataframe


Addendum:

Real data:

x<-structure(list(DATE = 19630101:19630104, PRECIP = c(0, 0, 0,0), 
               OBS_Q = c(1.61, 1.48, 1.4, 1.33), swb = c(1.75, 1.73, 1.7,1.67), 
               gr4j = c(1.9, 1.77, 1.67, 1.58), isba = c(0.83, 0.83,0.83, 0.83), 
               noah = c(1.31, 1.19, 1.24, 1.31), sac = c(1.99,1.8, 1.66, 1.57), 
               swap = c(1.1, 1.05, 1.08, 0.99), vic.mm.day. = c(2.1,1.75, 1.55, 1.43)), 
          .Names = c("DATE", "PRECIP", "OBS_Q", "swb","gr4j", "isba", "noah", "sac", "swap", "vic.mm.day."), 
          class = c("data.table","data.frame"), row.names = c(NA, -4L))

The column to be lagged is OBS_Q.

Community
  • 1
  • 1
maximusdooku
  • 5,242
  • 10
  • 54
  • 94

3 Answers3

11

I might build something around base R's embed()

x <- c(rep(NA,2),1:6)
embed(x,3)
#      [,1] [,2] [,3]
# [1,]    1   NA   NA
# [2,]    2    1   NA
# [3,]    3    2    1
# [4,]    4    3    2
# [5,]    5    4    3
# [6,]    6    5    4

Perhaps something like this:

f <- function(x, dimension, pad) {
    if(!missing(pad)) {
        x <- c(rep(pad, dimension-1), x)
    }
    embed(x, dimension)
}
f(1:6, dimension=3, pad=NA)
#      [,1] [,2] [,3]
# [1,]    1   NA   NA
# [2,]    2    1   NA
# [3,]    3    2    1
# [4,]    4    3    2
# [5,]    5    4    3
# [6,]    6    5    4
Josh O'Brien
  • 159,210
  • 26
  • 366
  • 455
  • 3
    You pretty much know the best way to do everything, don't you? Haha – Rich Scriven Jan 20 '15 at 22:09
  • @RichardScriven -- If I said "yes", would you believe me? :) – Josh O'Brien Jan 20 '15 at 22:13
  • I've been trying to find the name of this function forever! Could not remember it. I think I had seen you use it a long time ago. – BrodieG Jan 20 '15 at 22:17
  • @BrodieG, Yes, [Arun posted this not long ago](http://stackoverflow.com/questions/27485384/reshape-of-time-series-in-r/27485984#27485984). – David Arenburg Jan 20 '15 at 22:18
  • Exquisite. Thank you. A few followup questions. I am now using df1<-f(df$x, dimension=3, pad=NA) as df is large and I want to create lagged values for only one column 'x'. After that I am cbind-ing df1 and df. This works for me. But how can I change the names of the columns in the loop as well (t-1,t-2) etc? Right now, I am getting 1,2,3,4 as column names. But thanks a ton for this. – maximusdooku Jan 20 '15 at 22:27
  • @maximusdooku -- If I understand you correctly, just do `colnames(df1) <- c("A", "B", "C", "D")` (or whatever) to rename the columns. You may also want to explicitly convert the matrix returned by `embed()` and my `f()` to a data.frame, using `df1 <- data.frame(df1)`, though whether that's necessary will depend on exactly what you're doing. – Josh O'Brien Jan 20 '15 at 22:37
  • A quite elegant solution, thanks! I add one extra step: `mat_res[,ncol(mat_res):1] ` in order to use in a LSTM model. – Pablo Casas Aug 15 '18 at 18:34
9

If you are looking for efficiency, try data.tables new shift function

library(data.table) # V >= 1.9.5
n <- 2
setDT(df)[, paste("t", 1:n) := shift(t, 1:n)][]
#    t t 1 t 2
# 1: 1  NA  NA
# 2: 2   1  NA
# 3: 3   2   1
# 4: 4   3   2
# 5: 5   4   3
# 6: 6   5   4 

Here you can set any name for your new columns (within paste) and you also don't need to bind this back to the original as this updates your data set by reference using the := operator.

David Arenburg
  • 91,361
  • 17
  • 137
  • 196
  • I have installed data.table, but I am still getting this error: Error in eval(expr, envir, enclos) : could not find function "shift". Any idea why? – maximusdooku Jan 20 '15 at 22:31
  • Thank you. But I am unable to use it. I tried installing devtools first. But I am getting this error: package ‘devtools’ is not available (for R version 3.0.1). I am using RStudio. – maximusdooku Jan 20 '15 at 22:33
  • Update you R version to a newer one, it has nothing to do with Rsutio. Take a look [here](http://stackoverflow.com/questions/13656699/update-r-using-rstudio) on how. Then, you need to follow the provided link. This is the development version that you download from Github. For simplicity, open a new R seesion and run `library(devtools); install_github("Rdatatable/data.table", build_vignettes = FALSE)`. Then load your data and try the code again. – David Arenburg Jan 20 '15 at 22:34
  • THanks. Upgraded R and was able to install devtools. However, the github repo couldn't be installed. Some kind of permission error for data.table.dll. Error: Command failed (1) – maximusdooku Jan 20 '15 at 22:56
  • Maybe try to use it on a fresh R session. My guess is that it cause by the older `data.table` version being loaded – David Arenburg Jan 20 '15 at 22:58
  • Thanks. Installed now. But now, I am getting this error: Error in .Call("Cshift", x, as.integer(n), fill, type) : "Cshift" not resolved from current namespace (data.table). Thanks for helping out. – maximusdooku Jan 20 '15 at 23:09
  • Hi, could you please see my question again. I have added my actual data. I am using this now: setDT(x)[, paste("OBS_Q", 1:n) := shift(OBS_Q, 1:n)][] – maximusdooku Jan 20 '15 at 23:26
  • Oh. That's not good. At the end of installing data.table, I get this message. Not an error, but.....: *** NB: by=.EACHI is now explicit. See README to restore previous behaviour. Attaching package: ‘data.table’ The following objects are masked from ‘package:lubridate’: hour, mday, month, quarter, wday, week, yday, year The following objects are masked from ‘package:reshape2’: dcast, melt – maximusdooku Jan 20 '15 at 23:32
  • What's not good about it? It just tells you that `data.table` package has a new function called `.EACHI` and that it is using some other packages too. – David Arenburg Jan 20 '15 at 23:33
  • Nothing. I referred to it working on your setup and not mine...Not sure how to get past it. The error still persists. I did a reinstall and restart of R after using this as well: install_github("Rdatatable/data.table", quick=TRUE). Both Vignette and Quick option is leading to the same error of Cshift.... – maximusdooku Jan 20 '15 at 23:38
  • 1
    Works now. Thanks! I have a piece of code at the beginning of every script which checks for packages and installs them automatically. So it was overwriting the data.table library with the CRAN version everytime without giving any messages. THanks! – maximusdooku Jan 20 '15 at 23:52
  • Great. You don't need the `[]` at the end btw. It was only to display the output. Your data set will be updating automatically as you run the code. – David Arenburg Jan 20 '15 at 23:56
5

1) lag.zoo The lag.zoo function in the zoo package can accept a vector of lags. Here we want the 0th lag, the -1 lag and the -2 lag:

library(zoo)
cbind(DF[-2], coredata(lag(zoo(DF$t), 0:-2)))

giving:

      year lag0 lag-1 lag-2
1 19620101    1    NA    NA
2 19630102    2     1    NA
3 19640103    3     2     1
4 19650104    4     3     2
5 19650104    5     4     3
6 19650104    6     5     4

which is as you have in the question but are you sure that that is what you want? The last three rows all have the same date so the 4th row, for example, is being lagged to the same date.

2) head Defining a simple Lag function we can do this using only the base of R:

Lag <- function(x, n = 1) c(rep(NA, n), head(x, -n))  # n > 0

data.frame(DF, `t-1` = Lag(DF$t), `t-2` = Lag(DF$t, 2), check.names = FALSE)

giving:

      year t t-1 t-2
1 19620101 1  NA  NA
2 19630102 2   1  NA
3 19640103 3   2   1
4 19650104 4   3   2
5 19650104 5   4   3
6 19650104 6   5   4

Note: We used this as our data frame:

DF <- data.frame(year = c(19620101, 19630102, 19640103, 19650104, 19650104,
                        19650104), t = 1:6)
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341