0

I want to use the lag-function of base R because I'm using it in a 1st year BSc-class I teach and don't want to introduce too many packages. However, I can't really understand how it works.

x <- data.frame(1:10)
x$l1 <- lag(x[,1], k=1)
x$l2 <- lag(x[,1], k=-1)

Produce:

   X1.10 l1 l2
1      1  1  1
2      2  2  2
3      3  3  3
4      4  4  4
5      5  5  5
6      6  6  6
7      7  7  7
8      8  8  8
9      9  9  9
10    10 10 10

I was expecting output that was one element shorter than x[,1], and preferably that R inserted a NA for the first obs where I don't have a lagged value.

I want to calculate auto-correlation using the cor-function, but that doesn't work with this type of behavoir for lag-function. Example using other data:

cor(Price[,1],lag(Price[,1], k = 1), use = "complete.obs")

This always return a (auto) correlation of 1.

These are 1st year BSc-students so I don't want to use too much packages or commands.

Follow-up: I'm aware of the following possibility, but don't find it satisfactory for my students.

n <- nrow(x)
c(NA,x[1:n-1,1])
 [1] NA  1  2  3  4  5  6  7  8  9
Dagfinn Rime
  • 67
  • 2
  • 8
  • When I run your code, I do get `NA` values in the first row for `l1` an `l2`. – andrew_reece Sep 20 '20 at 16:22
  • As @andrew_reece said I also got `NA` maybe `dplyr` is loaded and it is creating some issues! – Duck Sep 20 '20 at 16:25
  • Also - the correlation is 1 because `x[,1]` and `lag(x[,1], k=1)` only differ by an offset, but each vector increases by 1 for each of its elements. For example, `a <- 1:10; b <- 2:11; cor(a, b)` produces a correlation of 1. – andrew_reece Sep 20 '20 at 16:26
  • Regarding your `lag` with `k = -1`, did you intend for this vector to have `NA` at the last element? If so, use `lead`: `x$l2 <- lead(x[,1], k=1)` – andrew_reece Sep 20 '20 at 16:29
  • And you don't want to use `acf` which is purpose built to compute auto correlation? – Chuck P Sep 20 '20 at 16:30
  • Per Chuck P's comment, `acf(x, lag.max = 1, plot = FALSE)` is the approach in base R. But maybe you want your students to derive auto-correlation without a convenience function? – andrew_reece Sep 20 '20 at 16:32
  • Thanks for responses. I restarted RStudio and issued `rm(list=ls())`. I also tried in RGui, and again same result. For the `cor`-example that was not for this example, but for other data. For the `k=-1`, I did that just in case in misinterpreted the function. I encountered this when I reviewed my lecture where I wanted to use the `cor`-function to illustrate the auto-correlation of level and change of a random walk. However, when I first made the program for the students at my work-PC I remember that everything came out as expected. What can be wrong? – Dagfinn Rime Sep 20 '20 at 16:36
  • Yes, I'm aware of the `acf`-function, but don't want to expose the students to too many functions at this stage. – Dagfinn Rime Sep 20 '20 at 16:38
  • 1
    `lag` lags a time index, not series itself, so use a time series argument. Also this means that lagging by k=1 means starting 1 unit earlier which has the effect of moving the series forward, not backward. Try: `tt <- ts(x); tt2 <- na.omit(cbind(tt, lag(tt, -1))); cor(tt2[, 1], tt2[, 2])` . `lag` is a generic so packages can define their own `lag` methods, e.g. zoo package, but beware that the xts package defines `k` with opposite sign to base R and dplyr completely clobbers it so use `library(dplyr, exclude = c("lag", "filter"))` if you need dplyr and then `dplyr::lag` for its lag. – G. Grothendieck Sep 20 '20 at 16:55

0 Answers0