3
zed <- data.frame(
  aAgg = c(5, 10, 15, 20),
  bAgg = c(8, 16, 24, 32),
  aPg = c(6, 9, 11, 24),
  bPg = c(7, 15, 22, 26)
)

diff_func <- function(col) {
  return(`{col}Agg` - `{colPg}`)
}

zed %>% 
  dplyr::mutate(dplyr::across(.cols = c('a', 'b'), .fns = diff_func, .names = "{col}Diff"))

# we want the output that this outputs, without having to have a mutate for each field.
zed <- zed %>%
  dplyr::mutate(aDiff = aAgg - aPg) %>%
  dplyr::mutate(bDiff = bAgg - bPg)

We are attempting to use dplyr's across function to create multiple columns. For each column prefix (a and b in this scenario), we'd like to compute the difference between prefixAgg - prefixPg, and name the new column prefixDiff. The last 3 lines of code in the example above generate the desired output. Our diff_func is currently not correct, throwing an error.

Is there a function we can pass to across that will generate this output?

Canovice
  • 9,012
  • 22
  • 93
  • 211

2 Answers2

5

We may need to loop over either the 'Agg' columns or 'Pg' columns and get the corresponding columns after replacing the substring from column names (cur_column()) and modify the .names

library(dplyr)
library(stringr)
zed %>%
   mutate(across(ends_with("Agg"), ~ .x -
   get(str_replace(cur_column(), "Agg", "Pg")), 
   .names = "{str_replace(.col, 'Agg', 'Diff')}"))

-output

  aAgg bAgg aPg bPg aDiff bDiff
1    5    8   6   7    -1     1
2   10   16   9  15     1     1
3   15   24  11  22     4     2
4   20   32  24  26    -4     6

Or use two across, get the difference - the resulting column will be a data.frame/tibble, then unpack the data.frame column

library(tidyr)
zed %>% 
  mutate(Diff = across(ends_with("Agg")) - across(ends_with("Pg"))) %>% 
  unpack(where(is.data.frame), names_sep = "")
# A tibble: 4 × 6
   aAgg  bAgg   aPg   bPg DiffaAgg DiffbAgg
  <dbl> <dbl> <dbl> <dbl>    <dbl>    <dbl>
1     5     8     6     7       -1        1
2    10    16     9    15        1        1
3    15    24    11    22        4        2
4    20    32    24    26       -4        6

NOTE: If needed, can rename the columns

zed %>% 
  mutate(across(ends_with("Agg"), 
  .names = "{str_remove(.col, 'Agg')}Diff") - 
      across(ends_with("Pg")))
  aAgg bAgg aPg bPg aDiff bDiff
1    5    8   6   7    -1     1
2   10   16   9  15     1     1
3   15   24  11  22     4     2
4   20   32  24  26    -4     6

Or may also use dplyover with across2

library(dplyover)
zed %>%
  mutate(across2(ends_with("Agg"), ends_with("Pg"), `-`, 
  .names_fn = ~ str_replace(.x, "Agg_.*", "Diff")))
  aAgg bAgg aPg bPg aDiff bDiff
1    5    8   6   7    -1     1
2   10   16   9  15     1     1
3   15   24  11  22     4     2
4   20   32  24  26    -4     6
akrun
  • 874,273
  • 37
  • 540
  • 662
3

split.default and dplyr solution (probably close to the fastest solution you can get; see here):

zed <- data.frame(
  aAgg = c(5, 10, 15, 20),
  bAgg = c(8, 16, 24, 32),
  aPg = c(6, 9, 11, 24),
  bPg = c(7, 15, 22, 26)
)

library(dplyr, warn.conflicts = F)
zed %>% 
  split.default(
    sub('^(.{1}).*', '\\1', names(zed))
  ) %>% 
  lapply(
    function(.x) .x[[1]] - .x[[2]]
  ) %>% 
  setNames(., paste0(names(.), 'Diff')) %>% 
  mutate(zed, !!!.)
#>   aAgg bAgg aPg bPg aDiff bDiff
#> 1    5    8   6   7    -1     1
#> 2   10   16   9  15     1     1
#> 3   15   24  11  22     4     2
#> 4   20   32  24  26    -4     6

Created on 2022-08-09 by the reprex package (v2.0.1)

Baraliuh
  • 2,009
  • 5
  • 11