Basic lag in R vector/dataframe

Question

Will most likely expose that I am new to R, but in SPSS, running lags is very easy. Obviously this is user error, but what I am missing?

x <- sample(c(1:9), 10, replace = T)
y <- lag(x, 1)
ds <- cbind(x, y)
ds

Results in:

      x y
 [1,] 4 4
 [2,] 6 6
 [3,] 3 3
 [4,] 4 4
 [5,] 3 3
 [6,] 5 5
 [7,] 8 8
 [8,] 9 9
 [9,] 3 3
[10,] 7 7

I figured I would see:

     x y
 [1,] 4 
 [2,] 6 4
 [3,] 3 6
 [4,] 4 3
 [5,] 3 4
 [6,] 5 3
 [7,] 8 5
 [8,] 9 8
 [9,] 3 9
[10,] 7 3

Any guidance will be much appreciated.

score 36 · Answer 1 · edited Mar 23 '17 at 12:47

36

I had the same problem, but I didn't want to use zoo or xts, so I wrote a simple lag function for data frames:

lagpad <- function(x, k) {
  if (k>0) {
    return (c(rep(NA, k), x)[1 : length(x)] );
  }
  else {
    return (c(x[(-k+1) : length(x)], rep(NA, -k)));
  }
}

This can lag forward or backwards:

x<-1:3;
(cbind(x, lagpad(x, 1), lagpad(x,-1)))
     x      
[1,] 1 NA  2
[2,] 2  1  3
[3,] 3  2 NA

edited Mar 23 '17 at 12:47

flexponsive

6,060
8
26
41

answered Oct 29 '12 at 19:58

Andrew

1,619
3
19
24

Lets say I wanted to do this function on a vector but preform it recursively for multiple lags `lagpad(x,-1:-216)` and output that information into one dataframe (e.g. lagpad(x,-1) becomes variable #1 of the df, lagpad(x,-2) becomes variable #2 of the df,lagpad(x,-3) becomes variable #3 of the df...and so on. would I have to cbind 216 columns or is there a shorter way to adapt your code to this scenario? – Danielle Sep 27 '17 at 19:56

score 29 · Accepted Answer · answered Aug 24 '10 at 17:28

29

Another way to deal with this is using the zoo package, which has a lag method that will pad the result with NA:

require(zoo)
> set.seed(123)
> x <- zoo(sample(c(1:9), 10, replace = T))
> y <- lag(x, -1, na.pad = TRUE)
> cbind(x, y)
   x  y
1  3 NA
2  8  3
3  4  8
4  8  4
5  9  8
6  1  9
7  5  1
8  9  5
9  5  9
10 5  5

The result is a multivariate zoo object (which is an enhanced matrix), but easily converted to a data.frame via

> data.frame(cbind(x, y))

answered Aug 24 '10 at 17:28

Gavin Simpson

170,508
25
396
453

2

Also note that if z is a zoo series then lag(z, 0:-1) is a two column zoo series with the original series and a lagged series. Also, coredata(z) will return just the data part of a zoo series and as.data.frame(z) will return a data frame with the data part of z as the column contents. – G. Grothendieck Aug 25 '10 at 04:23
Am I the only one finding that zoo is getting k backwards? In this example k=-1 is negative so I would expect y to be leading, but it's in fact lagging behind x. The default is k=1 so if I write "y = lag(x)", I end up with y leading x. This is... misleading. – Thrastylon Aug 20 '20 at 20:59
zoo's design principles include consistency with base R and in base R a positive lag causes the series to start earlier. See ?lag – G. Grothendieck Aug 20 '20 at 22:45
@G.Grothendieck, just came to this post with a similar problem and tried running your accepted solution, but got this error: `Error: "n" must be a nonnegative integer scalar, not an integer vector of length 1.` Changing the `-1` to `1` eliminates the error, but raises the question as to whether something has changed since you wrote this solution -- of which readers of this post should be aware. Care to comment? Thanks. – W Barker Apr 04 '22 at 14:20
@W Barker, You likely introduced an error by loading dplyr which clobbers `lag` in the base of R. Use `library(dplyr, exclude = c("filter", "lag"))` or don't load dplyr. – G. Grothendieck Apr 04 '22 at 14:41

score 15 · Answer 3 · answered Aug 24 '10 at 17:15

15

lag does not shift the data, it only shifts the "time-base". x has no "time base", so cbind does not work as you expected. Try cbind(as.ts(x),lag(x)) and notice that a "lag" of 1 shifts the periods forward.

I would suggesting using zoo / xts for time series. The zoo vignettes are particularly helpful.

answered Aug 24 '10 at 17:15

Joshua Ulrich

173,410
32
338
418

Neither `zoo` nor `xts` seems to be stock, where do I get them? – zwol Aug 24 '10 at 17:19
2

`install.packages("xts") # this will install zoo as well` – Joshua Ulrich Aug 24 '10 at 17:20

score 9 · Answer 4 · answered Dec 02 '14 at 00:11

9

Using just standard R functions this can be achieved in a much simpler way:

x <- sample(c(1:9), 10, replace = T)
y <- c(NA, head(x, -1))
ds <- cbind(x, y)
ds

answered Dec 02 '14 at 00:11

Alexander Radev

652
5
11

score 7 · Answer 5 · answered Oct 13 '15 at 11:56

7

The easiest way to me now appears to be the following:

require(dplyr)
df <- data.frame(x = sample(c(1:9), 10, replace = T))
df <- df %>% mutate(y = lag(x))

answered Oct 13 '15 at 11:56

matt_jay

1,241
1
15
33

Yes! In any context it seems, just swap dplyr::lag for standard lag and then works fine on non time series... job done! – TickboxPhil Apr 09 '19 at 15:29

score 7 · Answer 6 · edited May 23 '17 at 11:55

lag() works with time series, whereas you are trying to use bare matrices. This old question suggests using embed instead, like so:

lagmatrix <- function(x,max.lag) embed(c(rep(NA,max.lag), x), max.lag+1)

for instance

> x
[1] 8 2 3 9 8 5 6 8 5 8
> lagmatrix(x, 1)
      [,1] [,2]
 [1,]    8   NA
 [2,]    2    8
 [3,]    3    2
 [4,]    9    3
 [5,]    8    9
 [6,]    5    8
 [7,]    6    5
 [8,]    8    6
 [9,]    5    8
[10,]    8    5

score 2 · Answer 7 · answered Mar 18 '13 at 18:34

2

tmp<-rnorm(10)
tmp2<-c(NA,tmp[1:length(tmp)-1])
tmp
tmp2

answered Mar 18 '13 at 18:34

Paweł Sakowski

21
2

score 2 · Answer 8 · answered Oct 23 '13 at 19:37

This should accommodate vectors or matrices as well as negative lags:

lagpad <- function(x, k=1) {
  i<-is.vector(x)
  if(is.vector(x)) x<-matrix(x) else x<-matrix(x,nrow(x))
  if(k>0) {
      x <- rbind(matrix(rep(NA, k*ncol(x)),ncol=ncol(x)), matrix(x[1:(nrow(x)-k),], ncol=ncol(x)))
  }
  else {
      x <- rbind(matrix(x[(-k+1):(nrow(x)),], ncol=ncol(x)),matrix(rep(NA, -k*ncol(x)),ncol=ncol(x)))
  }
  if(i) x[1:length(x)] else x
}

score 2 · Answer 9 · answered Jan 10 '20 at 10:37

2

Using data.table:

> x <- sample(c(1:9), 10, replace = T)
> y <- data.table::shift(x)
> ds <- cbind(x, y)
> ds
      x  y
 [1,] 5 NA
 [2,] 4  5
 [3,] 3  4
 [4,] 3  3
 [5,] 4  3
 [6,] 8  4
 [7,] 1  8
 [8,] 7  1
 [9,] 9  7
[10,] 7  9

answered Jan 10 '20 at 10:37

AKRosenblad

91
3

score 0 · Answer 10 · answered Oct 27 '16 at 13:26

a simple way to do the same may be copying the data to a new data frame and changing the index number. Make sure the original table is indexed sequentially with no gaps

e.g.

tempData <- originalData
rownames(tempData) <- 2:(nrow(tempData)+1)

if you want it in the same data frame as the original use a cbind function

score 0 · Answer 11 · answered Dec 20 '18 at 15:34

Two options, in base R and with data.table:

baseShiftBy1 <- function(x) c(NA, x[-length(x)])
baseShiftBy1(x)
[1] NA  3  8  4  8  9  1  5  9  5

data.table::shift(x)
[1] NA  3  8  4  8  9  1  5  9  5

Data:

set.seed(123)
(x <- sample(c(1:9), 10, replace = T))
[1] 3 8 4 8 9 1 5 9 5 5

score 0 · Answer 12 · answered Aug 20 '20 at 21:18

I went with a similar solution to Andrew's (dedicated function instead of xts or zoo), but with a terser formulation that I find easier to reason about:

lagpad <- function(x, k) {
  if (k == 0) { return(x) }
  k.pos <- max(0, k)
  k.neg <- max(0, -k)
  c(rep(NA, k.pos), head(x, -k.pos),  # empty if k<0, else lagging x
    tail(x, -k.neg), rep(NA, k.neg))  # empty if k>0, else leading x
}

score -1 · Answer 13 · edited Jun 12 '15 at 00:42

-1

Just get rid of lag. Change your line for y to:

y <- c(NA, x[-1])

edited Jun 12 '15 at 00:42

chollida

7,834
11
55
85

answered Aug 24 '10 at 20:20

frankc

11,290
4
32
49

8

this is not correct! Probably you wanted to say `y <- c(NA, head(x, -1))` – Tomas Oct 13 '11 at 19:03

Basic lag in R vector/dataframe

13 Answers13

Linked

Related