23

I would like to:

  1. Use across and case_when to check if columns A1-A3 == 1
  2. Concatenate the column names of the columns where A1-A3 == 1 and
  3. mutate a new column with the concatenated column names

My dataframe:

df <- tribble(
~ID,    ~A1,    ~A2,    ~A3,
1, 0, 1, 1, 
2, 0, 1, 1, 
3, 1, 1, 1, 
4, 1, 0, 1, 
5, 0, 1, 0)

Desired Output:

# A tibble: 5 x 5
     ID    A1    A2    A3 New_Col 
  <dbl> <dbl> <dbl> <dbl> <chr>   
1     1     0     1     1 A2 A3   
2     2     0     1     1 A2 A3   
3     3     1     1     1 A1 A2 A3
4     4     1     0     1 A1 A3   
5     5     0     1     0 A2   

So far I have tried:

df %>% 
  rowwise() %>% 
  mutate(New_Col = across(A1:A3, ~ case_when(. == 1 ~ paste0("colnames(.)", collapse = " "))))

Not working Output:

     ID    A1    A2    A3 New_Col$A1  $A2         $A3        
  <dbl> <dbl> <dbl> <dbl> <chr>       <chr>       <chr>      
1     1     0     1     1 NA          colnames(.) colnames(.)
2     2     0     1     1 NA          colnames(.) colnames(.)
3     3     1     1     1 colnames(.) colnames(.) colnames(.)
4     4     1     0     1 colnames(.) NA          colnames(.)
5     5     0     1     0 NA          colnames(.) NA   

What I want to learn:

  1. Is it possible to use across to check for conditions across multiple columns
  2. If yes how looks the part after ~ of case_when to get the specific colnames
  3. How can I get only one column after using mutate, across and case_when and not 3 like here.

I thought I already was able to master this task, but somehow I lost it...

TarJae
  • 72,363
  • 6
  • 19
  • 66

4 Answers4

10

To use across with case_when you can do -

library(dplyr)
library(tidyr)

df %>% 
  mutate(across(A1:A3, ~case_when(. == 1 ~ cur_column()), .names = 'new_{col}')) %>%
  unite(New_Col, starts_with('new'), na.rm = TRUE, sep = ' ')

#    ID    A1    A2    A3 New_Col 
#  <dbl> <dbl> <dbl> <dbl> <chr>   
#1     1     0     1     1 A2 A3   
#2     2     0     1     1 A2 A3   
#3     3     1     1     1 A1 A2 A3
#4     4     1     0     1 A1 A3   
#5     5     0     1     0 A2      

across creates 3 new columns named new_A1, new_A2 and new_A3 with the column name if the value is 1 or NA otherwise. Using unite we combine the 3 columns into one New_col.


Also we can use rowwise with c_across -

df %>% 
  rowwise() %>% 
  mutate(New_Col = paste0(names(.[-1])[c_across(A1:A3) == 1], collapse = ' '))
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
7

without rowwise/ across you may also obtain same using cur_data()

df %>% group_by(ID) %>%
  mutate(new_col = paste0(names(df[-1])[as.logical(cur_data())], collapse = ' '))

# A tibble: 5 x 5
# Groups:   ID [5]
     ID    A1    A2    A3 new_col 
  <dbl> <dbl> <dbl> <dbl> <chr>   
1     1     0     1     1 A2 A3   
2     2     0     1     1 A2 A3   
3     3     1     1     1 A1 A2 A3
4     4     1     0     1 A1 A3   
5     5     0     1     0 A2 

a . instead of df inside mutate will also do

df %>% group_by(ID) %>%
  mutate(new_col = paste0(names(.[-1])[as.logical(cur_data())], collapse = ' '))
AnilGoyal
  • 25,297
  • 4
  • 27
  • 45
  • 1
    Awesome Anil ji and Ronak, Have one query, here `cur_data` is each group, will it work even if there are more than 1 row for each group? Because I tried `as.logical(df[-1])` and expecting a DF of `TRUE` and `FALSE` but got this error: `Error: 'list' object cannot be coerced to type 'logical'`. And what's the difference between `cur_data` and `cur_group` – Karthik S May 30 '21 at 12:03
  • 1
    Hi @KarthikS, you may call me Anil, see some explanation [here](https://dplyr.tidyverse.org/reference/context.html). `cur_data` returns the current data (grouped of course) and `cur_group` represents group keys. So `cur_data` will return binary values here and `cur_group` will return ids. Hope this is clear – AnilGoyal May 30 '21 at 12:19
4

Using base R

df$New_Col <- apply(df[-1], 1, \(x) paste(names(x)[as.logical(x)], collapse=' '))
df$New_Col
#[1] "A2 A3"    "A2 A3"    "A1 A2 A3" "A1 A3"    "A2"  

Or using tidyverse

library(dplyr)
library(purrr)
library(stringr)
df %>%
   mutate(New_Col = across(A1:A3, ~ c('', cur_column())[. + 1] ) %>% 
                       invoke(str_c, .))
akrun
  • 874,273
  • 37
  • 540
  • 662
3

One option involving also purrr could be:

df %>%
 mutate(New_Col = pmap_chr(across(-ID), 
                           ~ paste(names(c(...))[which(c(...) == 1)], collapse = " ")))

     ID    A1    A2    A3 New_Col 
  <dbl> <dbl> <dbl> <dbl> <chr>   
1     1     0     1     1 A2 A3   
2     2     0     1     1 A2 A3   
3     3     1     1     1 A1 A2 A3
4     4     1     0     1 A1 A3   
5     5     0     1     0 A2 
tmfmnk
  • 38,881
  • 4
  • 47
  • 67