2

I usually have to perform equivalent calculations on a series of variables/columns that can be identified by their suffix (ranging, let's say from _a to _i) and save the result in new variables/columns. The calculations are equivalent, but vary between the variables used in the calculations. These again can be identified by the same suffix (_a to _i). So what I basically want to achieve is the following:

newvar_a = (oldvar1_a + oldvar2_a) - z
...
newvar_i = (oldvar1_i + oldvar2_i) - z

This is the farest I got:

mutate(across(c(oldvar1_a:oldvar1_i), ~ . - z, .names = "{col}_new"))

Thus, I'm able to "loop" over oldvar1_a to oldvar1_i, substract z from them and save the results in new columns named oldvar1_a_new to oldvar1_i_new. However, I'm not able to include oldvar2_a to oldvar2_i in the calculations as R won't loop over them. (Additionally, I'd still need to rename the new columns).

I found a way to achieve the result using a for-loop. However, this definitely doesn't look like the most efficient and straightforward way to do it:

for (i in letters[1:9]) {
  oldvar1_x <- paste0("oldvar1_", i)
  oldvar2_x <- paste0("oldvar2_", i)
  newvar_x <- paste0("newvar_", i)
  df <- df %>%
    mutate(!!sym(newvar_x) := (!!sym(oldvar1_x) + !!sym(oldvar2_x)) - z)
}

Thus, I'd like to know whether/how to make mutate(across) loop over multiple columns that can be identified by suffixes (as in the example above)

2 Answers2

2

In this case, you can use cur_data() and cur_column() to take advantage that we are wanting to sum together columns that have the same suffix but just need to swap out the numbers.

library(dplyr)

df <- data.frame(
  oldvar1_a = 1:3,
  oldvar2_a = 4:6,
  oldvar1_i = 7:9,
  oldvar2_i = 10:12,
  z = c(1,10,20)
)

mutate(
  df,
  across(
    starts_with("oldvar1"),
    ~ (.x + cur_data()[gsub("1", "2", cur_column())]) - z,
    .names = "{col}_new"
  )
)
#>   oldvar1_a oldvar2_a oldvar1_i oldvar2_i  z oldvar2_a oldvar2_i
#> 1         1         4         7        10  1         4        16
#> 2         2         5         8        11 10        -3         9
#> 3         3         6         9        12 20       -11         1

If you want to use with case_when, just make sure to index using [[, you can read more here.

df <- data.frame(
  oldvar1_a = 1:3,
  oldvar2_a = 4:6,
  oldvar1_i = 7:9,
  oldvar2_i = 10:12,
  z = c(1,2,0)
)

mutate(
  df,
  across(
    starts_with("oldvar1"),
    ~ case_when(
      z == 1 ~ .x,
      z == 2 ~ cur_data()[[gsub("1", "2", cur_column())]],
      TRUE ~ NA_integer_
    ),
    .names = "{col}_new"
  )
)
#>   oldvar1_a oldvar2_a oldvar1_i oldvar2_i z oldvar1_a_new oldvar1_i_new
#> 1         1         4         7        10 1             1             7
#> 2         2         5         8        11 2             5            11
#> 3         3         6         9        12 0            NA            NA
caldwellst
  • 5,719
  • 6
  • 22
  • Thank you @caldwellst . Just a short follow-up question: Does this also work when working with case_when statements? So, in cases I want newvar_a to newvar_i to get the values of oldvar1_a to oldvar1_i if z == 1 and oldvar2_a to oldvar2_i if z == 2 (and else NaNs). Unfortunately, I couldn't make this work playing around with your solution to the question above. – user17487234 Nov 24 '21 at 10:12
  • 1
    @user17487234 have added in main body. The problem you would get would be because of indexing using a single `[`. I'd recommend familiarising yourself with the various indexing methods and why you were getting back a `tbl_df` rather than a vector. – caldwellst Nov 24 '21 at 10:25
0

There is a fairly straightforward way to do what I believe you are attempting to do.

# first lets create data
library(dplyr)
df <- data.frame(var1_a=runif(10, min = 128, max = 131), 
                 var2_a=runif(10, min = 128, max = 131),
                 var1_b=runif(10, min = 128, max = 131), 
                 var2_b=runif(10, min = 128, max = 131),
                 var1_c=runif(10, min = 128, max = 131), 
                 var2_c=runif(10, min = 128, max = 131))
# taking a wild guess at what your z is
z <- 4
# initialize a list
fnl   <- list()

# iterate over all your combo, put in list
for (i in letters[1:3])
{
  dc   <- df %>% select(ends_with(i))
  i    <- dc %>% mutate(a = rowSums(dc[1:ncol(dc)]) - z)
  fnl  <- append(fnl, i)
}  

# convert to a dataframe/tibble  
final <- bind_cols(fnl)

I left the column names sloppy assuming you had specific requirements here. You can convert this loop into a function and do the whole thin in a single step using purrr.

Jim
  • 191
  • 6