Create multiple new columns in tibble in R based on value of previous row giving prefix to all

Question

I have a tibble as so:

df <- tibble(a = seq(1:10),
         b = seq(21,30),
         c = seq(31,40))

I want to create a new tibble, where I want to lag some. I want to create new columns called prev+lagged_col_name, eg prev_a. In my actual data, there are a lot of cols so I don't want to manually write it out. Additonally I only want to do it for some cols. In this eg, I have done it manually but wanted to know if there is a way to use a function to do it.

 df_new <- df %>%
  mutate(prev_a = lag(a),
         prev_b = lag(b),
         prev_d = lag(d))

Thanks for your help!

`mutate_at(.vars = vars("a","b","c"), function(x) lag(x))` this will replace the names, but you can `bind_cols()` to the original `data.frame` `bind_cols(df_new, df_new %>% mutate_at(.vars = vars("a","b","c"), function(x) lag(x)))` If you have dplyr 1.0 you can use `across` function — Matias Andina, Apr 29 '20 at 17:39
You can also name the function as such: `df %>% mutate_at(.vars = vars("a","b","c"), .funs = list(prev = function(x) lag(x)))`. This will create columns `a_prev`, `b_prev`, `c_prev`. — Bas, Apr 29 '20 at 17:47

TimTeaFan · Answer 1 · 2020-04-29T18:59:26.190

With the current dplyr version you can create new variable names with mutate_at, using a named list will take the name of the list as suffix. If you want it as a prefix as in your example you can use rename_at to correct the variable naming. With your real data, you need to adjust the vars() selection. For your example data matches("[a-c]") did work.

library(dplyr)
df <- tibble(a = seq(1:10),
             b = seq(21,30),
             c = seq(31,40))
df %>% 
  mutate_at(vars(matches("[a-c]")), list(prev = ~ lag(.x))) 
#> # A tibble: 10 x 6
#>        a     b     c a_prev b_prev c_prev
#>    <int> <int> <int>  <int>  <int>  <int>
#>  1     1    21    31     NA     NA     NA
#>  2     2    22    32      1     21     31
#>  3     3    23    33      2     22     32
#>  4     4    24    34      3     23     33
#>  5     5    25    35      4     24     34
#>  6     6    26    36      5     25     35
#>  7     7    27    37      6     26     36
#>  8     8    28    38      7     27     37
#>  9     9    29    39      8     28     38
#> 10    10    30    40      9     29     39

df %>% 
  mutate_at(vars(matches("[a-c]")), list(prev = ~ lag(.x))) %>% 
  rename_at(vars(contains( "_prev") ), list( ~paste("prev", gsub("_prev", "", .), sep = "_")))
#> # A tibble: 10 x 6
#>        a     b     c prev_a prev_b prev_c
#>    <int> <int> <int>  <int>  <int>  <int>
#>  1     1    21    31     NA     NA     NA
#>  2     2    22    32      1     21     31
#>  3     3    23    33      2     22     32
#>  4     4    24    34      3     23     33
#>  5     5    25    35      4     24     34
#>  6     6    26    36      5     25     35
#>  7     7    27    37      6     26     36
#>  8     8    28    38      7     27     37
#>  9     9    29    39      8     28     38
#> 10    10    30    40      9     29     39

^{Created on 2020-04-29 by the reprex package (v0.3.0)}

score 1 · Accepted Answer · answered Apr 29 '20 at 17:47

You could do this this way

df_new <- bind_cols(
 df,
 df %>% mutate_at(.vars = vars("a","b","c"), function(x) lag(x))
)

Names are a bit nasty but you can rename them check here. Or see @Bas comment to get the names with a suffix.

# A tibble: 10 x 6
       a     b     c    a1    b1    c1
   <int> <int> <int> <int> <int> <int>
 1     1    21    31    NA    NA    NA
 2     2    22    32     1    21    31
 3     3    23    33     2    22    32
 4     4    24    34     3    23    33
 5     5    25    35     4    24    34
 6     6    26    36     5    25    35
 7     7    27    37     6    26    36
 8     8    28    38     7    27    37
 9     9    29    39     8    28    38
10    10    30    40     9    29    39

If you have dplyr 1.0 you can use the new accross() function.

See some expamples from the docs, instead of mean you want lag

df %>% mutate_if(is.numeric, mean, na.rm = TRUE)
# ->
df %>% mutate(across(is.numeric, mean, na.rm = TRUE))

df %>% mutate_at(vars(x, starts_with("y")), mean, na.rm = TRUE)
# ->
df %>% mutate(across(c(x, starts_with("y")), mean, na.rm = TRUE))

df %>% mutate_all(mean, na.rm = TRUE)
# ->
df %>% mutate(across(everything(), mean, na.rm = TRUE))

Create multiple new columns in tibble in R based on value of previous row giving prefix to all

2 Answers2