11

What is the most efficient way to make a matrix of lagged variables in R for an arbitrary variable (i.e. not a regular time series)

For example:

Input:

x <- c(1,2,3,4) 

2 lags, output:

[1,NA, NA]
[2, 1, NA]
[3, 2,  1]
[4, 3,  2]
jogo
  • 12,469
  • 11
  • 37
  • 42
James in Ottawa
  • 229
  • 1
  • 2
  • 7

4 Answers4

21

You can achieve this using the built-in embed() function, where its second 'dimension' argument is equivalent to what you've called 'lag':

x <- c(NA,NA,1,2,3,4)
embed(x,3)

## returns
     [,1] [,2] [,3]
[1,]    1   NA   NA
[2,]    2    1   NA
[3,]    3    2    1
[4,]    4    3    2

embed() was discussed in a previous answer by Joshua Reich. (Note that I prepended x with NAs to replicate your desired output).

It's not particularly well-named but it is quite useful and powerful for operations involving sliding windows, such as rolling sums and moving averages.

Community
  • 1
  • 1
medriscoll
  • 26,995
  • 17
  • 40
  • 36
9

Use a proper class for your objects; base R has ts which has a lag() function to operate on. Note that these ts objects came from a time when 'delta' or 'frequency' where constant: monthly or quarterly data as in macroeconomic series.

For irregular data such as (business-)daily, use the zoo or xts packages which can also deal (very well!) with lags. To go further from there, you can use packages like dynlm or dlm allow for dynamic regression models with lags.

The Task Views on Time Series, Econometrics, Finance all have further pointers.

Dirk Eddelbuettel
  • 360,940
  • 56
  • 644
  • 725
2

The running function in the gtools package does more or less what you want:

> require("gtools")
> running(1:4, fun=I, width=3, allow.fewer=TRUE)

$`1:1`
[1] 1

$`1:2` 
[1] 1 2

$`1:3` 
[1] 1 2 3

$`2:4` 
[1] 2 3 4
Jonathan Chang
  • 24,567
  • 5
  • 34
  • 33
  • But James wanted a matrix not a list. You could package the result using matrix(unlist(...)) but the embed() function does it in one step. – Rob Hyndman Aug 23 '09 at 05:56
  • Totally right, which is why I upvoted the embed() solution when it came out =). But 'running' is still a useful function I think --- most of the time when I wanted to create the matrix James asked for, what I really wanted to do was run apply on it. – Jonathan Chang Aug 23 '09 at 16:46
1

The method that works best for me is to use the lag function from the dplyr package.

Example:

> require(dplyr)
> lag(1:10, 1)
 [1] NA  1  2  3  4  5  6  7  8  9
> lag(1:10, 2)
 [1] NA NA  1  2  3  4  5  6  7  8
I Like to Code
  • 7,101
  • 13
  • 38
  • 48