Create data frame variables based on a function with two matching variable arguments where argument order matters

Question

Here is a toy data frame

df <- data.frame(alpha = c(rep(.005,5)),
                 a1 = c(1:5), 
                 b1 = c(4:8), 
                 c1 = c(10:14), 
                 a2 = c(9:13), 
                 b2 = c(3:7), 
                 c2 = c(15:19))

Here is a nonsensical toy function that requires two variables, both of which must have the same letter prefix. The specific function calculation is not important. Rather, the issue is how to pass two or more separate named variables to the function from the data frame where the order of the arguments matters.

toy_function <- function(x,y){
  z = x+y
  w = x/y
  v = z+w
  return(v)
}

Manual calculation of new variables using the function would look like this. Not practical when you've got dozens or hundreds of variable pairs.

df2 <- df %>% 
  mutate(va = toy_function(a1,a2),
         vb = toy_function(b1,b2),
         vc = toy_function(c1,c2)
         )

How can I do this across all matching pairs of variables? This problem seems similar to How to use map from purrr with dplyr::mutate to create multiple new columns based on column pairs but that example was applying a simple mathematical function (e.g., +) in which variable order does not matter. I'm having trouble figuring out how to modify it for this case.

Ronak Shah · Accepted Answer · 2020-11-24T02:03:08.290

Here is one base R approach using split.default.

cbind(df, sapply(split.default(df[-1], 
                     sub('\\d+', '', names(df)[-1])), function(x) 
  toy_function(x[[1]], x[[2]])))

#  alpha a1 b1 c1 a2 b2 c2    a     b    c
#1 0.005  1  4 10  9  3 15 10.1  8.33 25.7
#2 0.005  2  5 11 10  4 16 12.2 10.25 27.7
#3 0.005  3  6 12 11  5 17 14.3 12.20 29.7
#4 0.005  4  7 13 12  6 18 16.3 14.17 31.7
#5 0.005  5  8 14 13  7 19 18.4 16.14 33.7

We ignore the first column ([-1]) since we don't want to include that in the calculation and create a group of similarly named column and split them into lists. Using sapply we apply toy_function to each element in the list.

sub is used to remove the numbers from the names and create groups to split on.

sub('\\d+', '', names(df)[-1])
#[1] "a" "b" "c" "a" "b" "c"

If you wish to use the tidyverse approach you could do :

library(dplyr)
library(purrr)

unique_names <- unique(sub('\\d+', '', names(df)[-1]))
map_dfc(unique_names, ~df[-1] %>%
                    select(matches(.x)) %>%
                    mutate(!!paste0('v', .x) := toy_function(.[[1]], .[[2]])))

#  a1 a2   va b1 b2    vb c1 c2   vc
#1  1  9 10.1  4  3  8.33 10 15 25.7
#2  2 10 12.2  5  4 10.25 11 16 27.7
#3  3 11 14.3  6  5 12.20 12 17 29.7
#4  4 12 16.3  7  6 14.17 13 18 31.7
#5  5 13 18.4  8  7 16.14 14 19 33.7

score 0 · Answer 2 · answered Nov 24 '20 at 05:36

You can do something like this

First, create a dataframe with the function arguments as columns and the values to be used for each function call as rows.

vars <- letters[1:3]

args <- tibble(
  arg1 = setNames(paste0(vars, 1), paste0("set_output_names_like_this_", vars)),
  arg2 = paste0(vars, 2)
)

> str(args)
tibble [3 x 2] (S3: tbl_df/tbl/data.frame)
 $ arg1: Named chr [1:3] "a1" "b1" "c1"
  ..- attr(*, "names")= chr [1:3] "set_output_names_like_this_a" "set_output_names_like_this_b" "set_output_names_like_this_c"
 $ arg2: chr [1:3] "a2" "b2" "c2"

Then, use pmap_dfc

df %>% mutate(pmap_dfc(args, function(arg1, arg2, d) toy_function(d[[arg1]], d[[arg2]]), .data))

Output

  alpha a1 b1 c1 a2 b2 c2 set_output_names_like_this_a set_output_names_like_this_b set_output_names_like_this_c
1 0.005  1  4 10  9  3 15                     10.11111                     8.333333                     25.66667
2 0.005  2  5 11 10  4 16                     12.20000                    10.250000                     27.68750
3 0.005  3  6 12 11  5 17                     14.27273                    12.200000                     29.70588
4 0.005  4  7 13 12  6 18                     16.33333                    14.166667                     31.72222
5 0.005  5  8 14 13  7 19                     18.38462                    16.142857                     33.73684

Create data frame variables based on a function with two matching variable arguments where argument order matters

2 Answers2