2

This is not the same question as this but an extension. What is the quickest way to generate multiple variables based on ones that you have created in mutate and dynamically name them. E.g.

library(dplyr)  
df<- data.frame(gg = rep(6:10),
                ba = rep(1:5))
df
  gg ba
1  6  1
2  7  2
3  8  3
4  9  4
5 10  5

desired output:

df_new
  gg ba diff.1 diff.2 sum_dif.1 sum_dif.2
1  6  1      5     10        25        50
2  7  2      5     10        25        50
3  8  3      5     10        25        50
4  9  4      5     10        25        50
5 10  5      5     10        25        50

Following the similar question I referenced I can get diff.1 diff.2

myfun <- function(df, n) {
  varname <- paste("diff", n , sep=".")
  mutate(df, !!varname := (gg - ba)*n)
}

for(i in 1:2) {
  df <- myfun(df, n=i)
}

which gives

df
  gg ba diff.1 diff.2
1  6  1      5     10
2  7  2      5     10
3  8  3      5     10
4  9  4      5     10
5 10  5      5     10

But not sure how to pass the generated variable to another line within mutate, I thought something like this:

myfun <- function(df, n) {
  varname <- paste("diff", n , sep=".")
  varname2 <- paste("sum_dif", n , sep=".")
  mutate(df, !!varname := (gg - ba)*n,
             !!varname2 := sum(!!varname))
}

Also happy to get any other solutions, maybe data.table? Thanks

user63230
  • 4,095
  • 21
  • 43

1 Answers1

2

We need to convert the string to symbol before doing the evaluation (!!)

myfun <- function(df, n) {
 varname <- paste("diff", n , sep=".")
 varname2 <- paste("sum_dif", n , sep=".")
 mutate(df, !!varname := (gg - ba)*n,
         !!varname2 := sum(!! rlang::sym(varname)))
}

Now, we apply the myfun

for(i in 1:2) {
  df <- myfun(df, n=i)
 }

 df %>%
    select(gg, ba, matches('^diff'), matches('^sum'))
#   gg ba diff.1 diff.2 sum_dif.1 sum_dif.2
#1  6  1      5     10        25        50
#2  7  2      5     10        25        50
#3  8  3      5     10        25        50
#4  9  4      5     10        25        50
#5 10  5      5     10        25        50
akrun
  • 874,273
  • 37
  • 540
  • 662
  • Thanks. If you passed a named vector into the `for loop` instead, why doesn't the name of columns change accordingly? For the 1:2 example above, say they are named `mod1`, `mod2`, why aren't the columns then called `diff.mod1`, `diff.mod2` etc? I thought `varname <- paste("diff", name(n) , sep=".")` would work. – user63230 Sep 07 '18 at 11:13
  • 1
    @user63230 Reason is due to the expression `(gg - ba)*n` Here `n` is a number. So, when you do the `for` loop, it needs to have some adjustment. i.e. `v1 <- c(mod1 = 1, mod2 = 2)` as named vector `for(nm in names(v1)) df <- myfun(df, n = v1[nm])` works with the same names, but if we want to reflect the name changes, then change the `myfun` to reflect it i.e. ` – akrun Sep 07 '18 at 13:49
  • 1
    @user63230 `myfun <- function(df, vec, n) { varname <- paste("diff", names(vec)[n], sep="."); varname2 <- paste("sum_diff", names(vec)[n], sep="."); mutate(df, !!varname := (gg = ba)*n, !!varname2 := sum(!! rlang::sym(varname)))}; for(i in v1) df <- myfun(df, v1, n = i)` – akrun Sep 07 '18 at 13:50
  • Thank you but this is driving me mad. It works if we have say `v1 <- seq(-.8, -.7, .1)` and `names(v1) <- paste("mod", seq(-.8,-.7, .1), sep="")` but if we change the seq to this `v1 <- seq(-.8, -.7, .01)` and `names(v1) <- paste("mod", seq(-.8,-.7, .01), sep="") it wont work`Error: The LHS of := must be a string or a symbol`. Its just a finer seq, I dont understand why it does work? Any ideas? – user63230 Sep 07 '18 at 15:37
  • @user63230 The reason is because we are indexing 1`names(vec)[n]` which takes integer index from 1 to any number.. You can loop through the sequence of 'v1' i.e. `for(i in seq_along(v1))` and then index the index – akrun Sep 07 '18 at 15:40
  • Sorry l don't get "index the index"? – user63230 Sep 07 '18 at 16:12
  • 1
    @user63230 What i meant is that you need `v1[i]` inside the function for `n` and the names `names(v1[i])` – akrun Sep 07 '18 at 16:13