Using dplyr::across with two sets of variables

Question

I have two sets of variables, for example variable a and variable a_avail. I am trying to change the value of a based on the value of a_avail and am wondering if this can be done using across with glue.

Here is what I have tried. No error is produced, but the glue does not appear to be picking up the values of .x_avail since all returned values are NA:

library(tidyverse)

df <- tibble(a = c(0, 1, 0, 0, 0),
       a_avail = c(1, 1, 1, 0, 0),
       b = c(1, 1, 1, 0, 0),
       b_avail = c(1, 0, 0, 1, 0))

df2 <- df %>% 
  mutate(across(.cols = c(a, b),
                .fns = ~case_when(
                  glue::glue("{.x}_avail") == 1 ~ .x,
                  glue::glue("{.x}_avail") == 0 ~ as.numeric(NA)
                ),
                .names = "{.col}_new"))

df2
#> # A tibble: 5 x 6
#>       a a_avail     b b_avail a_new b_new
#>   <dbl>   <dbl> <dbl>   <dbl> <dbl> <dbl>
#> 1     0       1     1       1    NA    NA
#> 2     1       1     1       0    NA    NA
#> 3     0       1     1       0    NA    NA
#> 4     0       0     0       1    NA    NA
#> 5     0       0     0       0    NA    NA

^{Created on 2021-02-12 by the reprex package (v0.3.0)}

score 3 · Accepted Answer · answered Apr 09 '21 at 06:45

Ronak Shah in his answer to a related question has suggested a fantastic approach, which I am reproducing below.

Actually two things

Inside mutate(across.. to use column/variable name instead of value cur_column() should be used as against . or .x.
get() may also be used alongwith glue so that R recognises it as a variable.

Do this

df %>% 
  mutate(across(.cols = c(a, b),
                .fns = ~case_when(
                  get(glue::glue("{cur_column()}_avail")) == 1 ~ .x,
                  get(glue::glue("{cur_column()}_avail")) == 0 ~ NA_real_
                ),
                .names = "{.col}_new"))

# A tibble: 5 x 6
      a a_avail     b b_avail a_new b_new
  <dbl>   <dbl> <dbl>   <dbl> <dbl> <dbl>
1     0       1     1       1     0     1
2     1       1     1       0     1    NA
3     0       1     1       0     0    NA
4     0       0     0       1    NA     0
5     0       0     0       0    NA    NA

score 2 · Answer 2 · answered Feb 12 '21 at 16:03

Not a tidyverse solution but this should work

library(tidyverse)

df <- tibble(a = c(0, 1, 0, 0, 0),
             a_avail = c(1, 1, 1, 0, 0),
             b = c(1, 1, 1, 0, 0),
             b_avail = c(1, 0, 0, 1, 0))


v1 <- list('a','b')
v2 <- list('a_avail','b_avail')


v3 <- as.data.frame(mapply(function(x,y){ifelse(df[[y]] == 0, NA,df[[x]])} , v1,v2, 
                           SIMPLIFY = TRUE))

names(v3) <- paste0(v1,"_new")

df3 <- cbind(df, v3)

score 2 · Answer 3 · answered Apr 07 '21 at 18:21

The primary issue you're having is in referring to columns themselves, rather than just comparing strings (or glue objects) with a number. You could probably put together a tidyeval function, but the (possibly) easier method is to reshape the data to a long format to have a column for your original values and a column for availability, adding the comparison in a new column, then reshaping back. This will also scale so you don't have to specify all the columns where you want to do this operation, or accurately match originals with available manually.

The first trick will be to have some way of marking the original columns, so you can split e.g. "a" from "avail". For that, tack on another string to the names that are only single characters. (You could use a different method of selecting columns.) Use an ID to mark off rows—you can drop this column later. Second trick is making use of the special ".value" term in the pivot functions.

I'd suggest walking through the reshaping steps one by one to see how they work and adjust as needed.

library(dplyr)
library(tidyr)
df %>%
  rename_with(~paste(., "orig", sep = "_"), matches("^[a-z]$")) %>%
  tibble::rowid_to_column() %>%
  pivot_longer(-rowid, names_to = c("col", ".value"), names_sep = "_") %>%
  mutate(new = if_else(avail == 1, orig, NA_real_)) %>%
  pivot_wider(id_cols = rowid, names_from = col, values_from = orig:new, 
              names_glue = "{col}_{.value}")
#> # A tibble: 5 x 7
#>   rowid a_orig b_orig a_avail b_avail a_new b_new
#>   <int>  <dbl>  <dbl>   <dbl>   <dbl> <dbl> <dbl>
#> 1     1      0      1       1       1     0     1
#> 2     2      1      1       1       0     1    NA
#> 3     3      0      1       1       0     0    NA
#> 4     4      0      0       0       1    NA     0
#> 5     5      0      0       0       0    NA    NA

score 2 · Answer 4 · answered Apr 07 '21 at 23:05

I think your desired output can be easily achieved by means of purrr package. In a way that instead of using across we use map2 function since we are dealing with 2 variables at the same time and we want to iterate on each pair of them row wise for our purpose:

library(dplyr)
library(purrr)

df <- tibble(a = c(0, 1, 0, 0, 0),
             a_avail = c(1, 1, 1, 0, 0),
             b = c(1, 1, 1, 0, 0),
             b_avail = c(1, 0, 0, 1, 0))


df %>%
  mutate(a_new = map2_dbl(a, a_avail, ~ ifelse(.y == 1, .x, NA)),
         b_new = map2_dbl(b, b_avail, ~ ifelse(.y == 1, .x, NA)))


# A tibble: 5 x 6
      a a_avail     b b_avail a_new b_new
  <dbl>   <dbl> <dbl>   <dbl> <dbl> <dbl>
1     0       1     1       1     0     1
2     1       1     1       0     1    NA
3     0       1     1       0     0    NA
4     0       0     0       1    NA     0
5     0       0     0       0    NA    NA

In a situation like this it would be better to mull over what function would best serve your purpose and best matches the set of arguments you have given what you want to do with them. Here since we are dealing with a row wise operation I am more comfortable to use purrr packaage functions.

Using dplyr::across with two sets of variables

4 Answers4

Linked