6

I have a tibble with a number of variables collected over time. A very simplified version of the tibble looks like this.

df = tribble(
~id, ~varA.t1, ~varA.t2, ~varB.t1, ~varB.t2,
'row_1', 5, 10, 2, 4,
'row_2', 20, 50, 4, 6
)

I want to systematically create a new set of variables varC so that varC.t# = varA.t# / varB.t# where # is 1, 2, 3, etc. (similarly to the way column names are setup in the tibble above).

How do I use something along the lines of mutate or across to do this?

AnilGoyal
  • 25,297
  • 4
  • 27
  • 45
Maher Said
  • 170
  • 2
  • 15

4 Answers4

4

You can do something like this with mutate(across..., however, for renaming columns there must be a shortcut.

df %>% 
  mutate(across(.cols = c(varA.t1, varA.t2),
                .fns = ~ .x / get(glue::glue(str_replace(cur_column(), "varA", "varB"))),
                .names = "V_{.col}")) %>%
  rename_with(~str_replace(., "V_varA", "varC"), starts_with("V_"))

# A tibble: 2 x 7
  id    varA.t1 varA.t2 varB.t1 varB.t2 varC.t1 varC.t2
  <chr>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
1 row_1       5      10       2       4     2.5    2.5 
2 row_2      20      50       4       6     5      8.33

If there is a long time series you can also create a vector for .cols beforehand.

AnilGoyal
  • 25,297
  • 4
  • 27
  • 45
  • Thank you very much @AnilGoyal! This is exactly what I needed. I ended up using a selection helper in `.cols` (`matches()`) since I have a long time series. Correct me if I'm wrong, but we don't really need the `glue()` in `.fns` since `get()` should work on the string name directly, right? It's as very creative solution and it reminded me that `cur_column()` exists! – Maher Said Apr 13 '21 at 14:47
4

I have a package on GitHub called {dplyover} which aims to solve this kind of problem in way similar to dplyr::across.

The function is called across2. It lets you define two sets of columns to which you can apply one or several functions. The .names argument supports two glue specifictions: {pre} and {suf}. They extract the shared pre- and suffix of the variable names. This makes it easy to put nice names on our output variables.

The function has one caveat. It is not performant when applied to highly grouped data (there is a vignette with benchmarks).

library(dplyr)
library(dplyover) # https://github.com/TimTeaFan/dplyover

df = tribble(
  ~id, ~varA.t1, ~varA.t2, ~varB.t1, ~varB.t2,
  'row_1', 5, 10, 2, 4,
  'row_2', 20, 50, 4, 6
)

df %>% 
  mutate(across2(starts_with("varA"),
                 starts_with("varB"),
                 ~ .x / .y,
                 .names = "{pre}C.{suf}"))

#> # A tibble: 2 x 7
#>   id    varA.t1 varA.t2 varB.t1 varB.t2 varC.t1 varC.t2
#>   <chr>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
#> 1 row_1       5      10       2       4     2.5    2.5 
#> 2 row_2      20      50       4       6     5      8.33

Created on 2021-04-10 by the reprex package (v0.3.0)

TimTeaFan
  • 17,549
  • 4
  • 18
  • 39
3

For such cases I find using base R easy and efficient.

varAcols <- sort(grep('varA', names(df), value = TRUE))
varBcols <- sort(grep('varB', names(df), value = TRUE))
df[sub('A', 'C', varAcols)] <- df[varAcols]/df[varBcols]

#    id    varA.t1 varA.t2 varB.t1 varB.t2 varC.t1 varC.t2
#  <chr>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
#1 row_1       5      10       2       4     2.5    2.5 
#2 row_2      20      50       4       6     5      8.33
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • Thank you @Ronak Shah and this indeed the most elegant solution. The use of `sub()` in the last line is quite smart, I wouldn't have thought of that! – Maher Said Apr 13 '21 at 14:49
2

Another way to do this with some customization is

Initial setup

library(dplyr)
library(purrr)
library(stringr)

df = tribble(
  ~id, ~varA.t1, ~varA.t2, ~varB.t1, ~varB.t2,
  'row_1', 5, 10, 2, 4,
  'row_2', 20, 50, 4, 6
)

# A function take in a formula then parse it and correct the column name
operation_function <- function(df, formula) {
  # Extract the column name from the formula
  new_column_name <- str_extract(formula, "^.+=")
  new_column_name <- trimws(gsub("=", "", new_column_name))
  
  # Process the df
  df %>%
    # parse the formula - this reuslt in new column name  as value formula
    mutate(!!rlang::parse_expr(formula)) %>%
    # rename the new created column with the correct column name
    rename(!!new_column_name := last_col())
}

Note: I think there should be more efficient way to implement the formula that have proper name. Though I couldn't figure it out right now. Welcome ideas from others

Prepare the formula to be process by the data. In this case it simple

For more complicated formula you may want to do it a little bit differently

# Prepare the formula
base_formula <- c("varC.t# = varA.t# / varB.t#")
replacement_list <- c(1, 2)
list_formula <- map(replacement_list, .f = gsub,
  pattern = "#", x = base_formula)
list_formula
#> [[1]]
#> [1] "varC.t1 = varA.t1 / varB.t1"
#> 
#> [[2]]
#> [1] "varC.t2 = varA.t2 / varB.t2"

Finally process the data with the list of formulas

# process with the function and then reduce them with left_join
reduce(map(.x = list_formula, .f = operation_function, df = df),
  left_join)
#> Joining, by = c("id", "varA.t1", "varA.t2", "varB.t1", "varB.t2")
#> # A tibble: 2 x 7
#>   id    varA.t1 varA.t2 varB.t1 varB.t2 varC.t1 varC.t2
#>   <chr>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
#> 1 row_1       5      10       2       4     2.5    2.5 
#> 2 row_2      20      50       4       6     5      8.33

Created on 2021-04-10 by the reprex package (v1.0.0)

Sinh Nguyen
  • 4,277
  • 3
  • 18
  • 26
  • 1
    Interesting take @Sinh Nguyen! I need to take a closer look at this when I have some more time. I never used formulas in my own functions, so your response is a great learning opportunity. Thank you! – Maher Said Apr 13 '21 at 14:52