5

Goal

I want to use a long vector of numbers, to create a matrix where each column is a successive offset (lag or lead) of the original vector. If n is the maximum offset, the matrix will have dimensions [length(vector), n * 2 + 1] (because we want offsets in both directions, and include the 0 offset, i.e. the original vector).

Example

To illustrate, consider the following vector:

test <- c(2, 8, 1, 10, 7, 5, 9, 3, 4, 6)

[1]  2  8  1 10  7  5  9  3  4  6

Expected output

Now we create offsets of values, let's say for n == 3:

      [,1] [,2] [,3] [,4] [,5] [,6] [,7]
 [1,]   NA   NA   NA    2    8    1   10
 [2,]   NA   NA    2    8    1   10    7
 [3,]   NA    2    8    1   10    7    5
 [4,]    2    8    1   10    7    5    9
 [5,]    8    1   10    7    5    9    3
 [6,]    1   10    7    5    9    3    4
 [7,]   10    7    5    9    3    4    6
 [8,]    7    5    9    3    4    6   NA
 [9,]    5    9    3    4    6   NA   NA
[10,]    9    3    4    6   NA   NA   NA

I am looking for an efficient solution. data.table or tidyverse solutions more than welcome.

Returning only the rows that have no NA's (i.e. rows 4 to 7) is also ok.

Current solution

lags  <- lapply(3:1, function(x) dplyr::lag(test, x))
leads <- lapply(1:3, function(x) dplyr::lead(test, x))
l <- c(lags, test, leads)
matrix(unlist(l), nrow = length(test))
Community
  • 1
  • 1
Axeman
  • 32,068
  • 8
  • 81
  • 94
  • Also `library(data.table) ; data.table(test)[, c(shift(test, 3:1), shift(test, 0:3, type = "lead"))]` or if you want a matrix without an intermediate step, maybe `do.call(cbind, (c(shift(test, 3:1), shift(test, 0:3, type = "lead"))))`. Also see [this](https://stackoverflow.com/questions/28055927/how-can-i-automatically-create-n-lags-in-a-timeseries/) and [this](https://stackoverflow.com/questions/27485384/reshape-of-time-series-in-r/). Also, I doubt the efficiency `embed` as it is a for loop internally. I would do some benchmarks if efficiency really matters. – David Arenburg Jul 03 '17 at 16:49
  • @DavidArenburg, there only seems to be a `for` loop when passing a matrix. For the case of the vector I think it is a single indexing call. – Axeman Jul 04 '17 at 07:44

3 Answers3

4

In base R, you can use embed to get rows 4 through 7. You have to reverse the column order, however.

embed(test, 7)[, 7:1]
     [,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,]    2    8    1   10    7    5    9
[2,]    8    1   10    7    5    9    3
[3,]    1   10    7    5    9    3    4
[4,]   10    7    5    9    3    4    6

data

test <- c(2, 8, 1, 10, 7, 5, 9, 3, 4, 6)
lmo
  • 37,904
  • 9
  • 56
  • 69
3

This will produce what you need...

n <- 3
t(embed(c(rep(NA,n), test, rep(NA,n)), length(test)))[length(test):1,]

      [,1] [,2] [,3] [,4] [,5] [,6] [,7]
 [1,]   NA   NA   NA    2    8    1   10
 [2,]   NA   NA    2    8    1   10    7
 [3,]   NA    2    8    1   10    7    5
 [4,]    2    8    1   10    7    5    9
 [5,]    8    1   10    7    5    9    3
 [6,]    1   10    7    5    9    3    4
 [7,]   10    7    5    9    3    4    6
 [8,]    7    5    9    3    4    6   NA
 [9,]    5    9    3    4    6   NA   NA
[10,]    9    3    4    6   NA   NA   NA
Andrew Gustar
  • 17,295
  • 1
  • 22
  • 32
1

This can be solved by constructing the matrix from a long vector and returning only the wanted columns and rows:

test <- c(2, 8, 1, 10, 7, 5, 9, 3, 4, 6)
n_offs <- 3L
n_row <- length(test) + n_offs + 1L
matrix(rep(c(rep(NA, n_offs), test), n_row), nrow = n_row)[1:length(test), 1:(n_offs * 2L + 1L)]
      [,1] [,2] [,3] [,4] [,5] [,6] [,7]
 [1,]   NA   NA   NA    2    8    1   10
 [2,]   NA   NA    2    8    1   10    7
 [3,]   NA    2    8    1   10    7    5
 [4,]    2    8    1   10    7    5    9
 [5,]    8    1   10    7    5    9    3
 [6,]    1   10    7    5    9    3    4
 [7,]   10    7    5    9    3    4    6
 [8,]    7    5    9    3    4    6   NA
 [9,]    5    9    3    4    6   NA   NA
[10,]    9    3    4    6   NA   NA   NA

A variant which just returns the same result as embed(test, 7)[, 7:1] is:

matrix(rep(test, length(test) + 1L), nrow = length(test) + 1L)[
  seq_len(length(test) - 2L * n_offs), seq_len(n_offs * 2L + 1L)]
     [,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,]    2    8    1   10    7    5    9
[2,]    8    1   10    7    5    9    3
[3,]    1   10    7    5    9    3    4
[4,]   10    7    5    9    3    4    6
Uwe
  • 41,420
  • 11
  • 90
  • 134