0

I am running 4.0.3. No access to the internet.

I want to lag a single column of a multicolumn Time-Series. I wasn't able to find a satisfactory answer anywhere else.

Intuitively this makes sense to me, but it just doesn't work:

library(tsbox)
data=data.frame(Date=c('2005-01-01','2005-02-01','2005-03-01','2005-04-01','2005-05-01'),
                col1 = c(1,2,3,4,5),
                col2 = c(1,2,3,4,5))
data[,'Date']= as.POSIXct(data[,'Date'],format='%Y-%m-%d')
timeseries = ts_ts(ts_long(data))
timeseries[,'col1_L1'] = lag(timeseries[,'col1'],1)

What I get:

         col1 col2 col1_L1
Jan 2005    1    1       1
Feb 2005    2    2       2
Mar 2005    3    3       3
Apr 2005    4    4       4
May 2005    5    5       5

What I would expect from this code:

         col1 col2 col1_L1
Jan 2005    1    1       NA
Feb 2005    2    2       1
Mar 2005    3    3       2
Apr 2005    4    4       3
May 2005    5    5       4
PencilBox
  • 65
  • 7
  • 2
    There are couple of typos in the post. 1) `Date` bracket needs to be closed in dataframe creation. 2) In `as.POSIXct(d[,'Date'],format='%Y-%m-%d')` I think you meant `data` instead of `d`. After correcting those two typos when I run `timeseries = ts_ts(ts_long(data)) ` I get the error `Error: time column needs to be specified as the first date of the period` – Ronak Shah Jun 29 '21 at 03:27
  • Thanks. Fixed. Strange, I don't get that error. – PencilBox Jun 29 '21 at 03:32
  • Can you not use simple `data$col1 <- dplyr::lag(data$col1)` – Ronak Shah Jun 29 '21 at 04:06

1 Answers1

1

I wasn't able to reproduce your example (likely due to the reasons pointed out in the comments) but perhaps you could use the function from this post, e.g.

data=data.frame(Date=c('2005-01-01','2005-02-01','2005-03-01','2005-04-01','2005-05-01'),
                       col1 = c(1,2,3,4,5),
                       col2 = c(1,2,3,4,5))
data[,'Date']= as.POSIXct(data[,'Date'],format='%Y-%m-%d')

lagpad <- function(x, k) {
  if (k>0) {
    return (c(rep(NA, k), x)[1 : length(x)] )
  }
  else {
    return (c(x[(-k+1) : length(x)], rep(NA, -k)))
  }
}

data$col_l1 <- lagpad(data$col2, 1)
data
#>         Date col1 col2 col_l1
#> 1 2005-01-01    1    1     NA
#> 2 2005-02-01    2    2      1
#> 3 2005-03-01    3    3      2
#> 4 2005-04-01    4    4      3
#> 5 2005-05-01    5    5      4
jared_mamrot
  • 22,354
  • 4
  • 21
  • 46
  • Thanks! I find it interesting that this isn't something that is built-in. For example, ```.shift()``` with Pandas in Python makes it trivial and takes into account datetime indexes. Even Stata makes it really easy and keeps track of all the dates for you. – PencilBox Jun 29 '21 at 03:45
  • 1
    You're welcome. I typically use the zoo, lubridate and/or data.table packages when working with time/date data, which all have "lag" functions and a plethora of other useful functions, but not having an internet connection complicates things. Anyways, I'm glad you got your problem sorted. This isn't really a duplicate of https://stackoverflow.com/questions/3558988/basic-lag-in-r-vector-dataframe, so I don't think it will be closed, but you should go and upvote that question/answer anyway. – jared_mamrot Jun 29 '21 at 03:53