Vectorialised column addition

Question

Given this tibble:

tibble(x = c(1:9))

I want to add a column x_lag_1 = c(NA,1:8), a column x_lag_2 = c(NA,NA,1:7), etc.

Up to x_lag_n.

score 3 · Answer 1 · answered Dec 28 '21 at 03:42

This can be quick with data.table:

library(data.table)

n <- seq(4)
setDT(df)[, paste0('x_lag_', n) := shift(x, n)]

df
  x x_lag_1 x_lag_2 x_lag_3 x_lag_4
1: 1      NA      NA      NA      NA
2: 2       1      NA      NA      NA
3: 3       2       1      NA      NA
4: 4       3       2       1      NA
5: 5       4       3       2       1
6: 6       5       4       3       2
7: 7       6       5       4       3
8: 8       7       6       5       4
9: 9       8       7       6       5

score 1 · Answer 2 · answered Dec 28 '21 at 02:19

1

You may use map_dfc to add n new columns.

library(dplyr)
library(purrr)

df <- tibble(x = c(1:9))
n <- 3
bind_cols(df, map_dfc(seq_len(n), ~df %>% 
                      transmute(!!paste0('x_lag', .x) := lag(x, .x))))

#      x x_lag1 x_lag2 x_lag3
#  <int>  <int>  <int>  <int>
#1     1     NA     NA     NA
#2     2      1     NA     NA
#3     3      2      1     NA
#4     4      3      2      1
#5     5      4      3      2
#6     6      5      4      3
#7     7      6      5      4
#8     8      7      6      5
#9     9      8      7      6

answered Dec 28 '21 at 02:19

Ronak Shah

377,200
20
156
213

is there a reason why this formula does not respect groups in the df, or a way to enforce them? – GiulioGCantone Dec 29 '21 at 23:53
No, it does not work properly, you can check it. ` df <- tibble(a = c(1:9), y = c(rep("A",5),rep("B",4)) ) %>% group_by(y)` `bind_cols(df, map_dfc(seq(2), ~df %>% transmute(!!paste0('a_lag', .x) := dplyr::lag(a, .x))))` – GiulioGCantone Dec 30 '21 at 01:35
The result is exactly the same, you can check it. – GiulioGCantone Dec 30 '21 at 01:42

jpdugo17 · Answer 3 · 2021-12-30T19:52:57.787

0

Edit 2: Reworked the answer to contemplate the case of a grouped df.

library(tidyverse)
set.seed(123)

df <- tibble(group = sample(letters[1:3], 30, replace = TRUE), x = c(1:30))

formulas <- seq(3, 12, 3) %>%
  map(~ as.formula(str_glue("~lag(.,n={.x})"))) %>%
  set_names(str_c("lag", seq(3, 12, 3)))

df %>% 
  summarise(x, across(x, lst(!!!formulas)))
#> # A tibble: 30 × 5
#>        x x_lag3 x_lag6 x_lag9 x_lag12
#>    <int>  <int>  <int>  <int>   <int>
#>  1     1     NA     NA     NA      NA
#>  2     2     NA     NA     NA      NA
#>  3     3     NA     NA     NA      NA
#>  4     4      1     NA     NA      NA
#>  5     5      2     NA     NA      NA
#>  6     6      3     NA     NA      NA
#>  7     7      4      1     NA      NA
#>  8     8      5      2     NA      NA
#>  9     9      6      3     NA      NA
#> 10    10      7      4      1      NA
#> # … with 20 more rows

df %>%
  group_by(group) %>%
  summarise(x, across(x, lst(!!!formulas)), .groups = "drop")
#> # A tibble: 30 × 6
#>    group     x x_lag3 x_lag6 x_lag9 x_lag12
#>    <chr> <int>  <int>  <int>  <int>   <int>
#>  1 a        10     NA     NA     NA      NA
#>  2 a        13     NA     NA     NA      NA
#>  3 a        16     NA     NA     NA      NA
#>  4 a        19     10     NA     NA      NA
#>  5 a        20     13     NA     NA      NA
#>  6 a        21     16     NA     NA      NA
#>  7 a        22     19     10     NA      NA
#>  8 a        27     20     13     NA      NA
#>  9 b         4     NA     NA     NA      NA
#> 10 b         6     NA     NA     NA      NA
#> # … with 20 more rows

^{Created on 2021-12-30 by the reprex package (v2.0.1)}

edited Dec 30 '21 at 19:52

answered Dec 28 '21 at 05:12

jpdugo17

6,816
2
11
23

This works pretty bad when instead of `1:n` there is a sequence of integers through a `by=`. – GiulioGCantone Dec 30 '21 at 00:02
What do you mean by 'sequence of integers through a by='? – jpdugo17 Dec 30 '21 at 00:14
```df <- tibble(x = c(1:9)) walk(str_c("x_lag_", seq(6,21,by=3)), ~ { df[[.]] <<- dplyr::lag(df$x, parse_number(.)) }) ``` – GiulioGCantone Dec 30 '21 at 00:32
You are lagging by 6 positions , then 9, but the dataset contains only 9 rows. Do you want to preserve the names but lag by a different number? – jpdugo17 Dec 30 '21 at 00:46
Is there a way to make this code respect groups? `group_walk` gives this error: `Error in UseMethod("group_map") : no applicable method for 'group_map' applied to an object of class "character"` – GiulioGCantone Dec 30 '21 at 01:04
Is `df` supposed to be grouped by a variable? Where did the groups came from? – jpdugo17 Dec 30 '21 at 01:10
Imagine that df is grouped by a variable – GiulioGCantone Dec 30 '21 at 01:16
@GiulioGCantone I edited the answer to include a case when the df is grouped and what it would look like. – jpdugo17 Dec 30 '21 at 01:42
It worked, with 2 workaround: - addition in `df` `bind_cols(df,group, map_dfc(seq_len(n))`, this keeps the existence of other columns - `relocate` group, which is just to make it tidier. However, I will wait a bunch of days because I feel that this is the "wrong way" to basically make a "mass mutate". – GiulioGCantone Dec 30 '21 at 02:05

Vectorialised column addition

3 Answers3