Concatenate column names in one column conditional on using mutate, across and case_when

Question

I would like to:

Use across and case_when to check if columns A1-A3 == 1
Concatenate the column names of the columns where A1-A3 == 1 and
mutate a new column with the concatenated column names

My dataframe:

df <- tribble(
~ID,    ~A1,    ~A2,    ~A3,
1, 0, 1, 1, 
2, 0, 1, 1, 
3, 1, 1, 1, 
4, 1, 0, 1, 
5, 0, 1, 0)

Desired Output:

# A tibble: 5 x 5
     ID    A1    A2    A3 New_Col 
  <dbl> <dbl> <dbl> <dbl> <chr>   
1     1     0     1     1 A2 A3   
2     2     0     1     1 A2 A3   
3     3     1     1     1 A1 A2 A3
4     4     1     0     1 A1 A3   
5     5     0     1     0 A2

So far I have tried:

df %>% 
  rowwise() %>% 
  mutate(New_Col = across(A1:A3, ~ case_when(. == 1 ~ paste0("colnames(.)", collapse = " "))))

Not working Output:

     ID    A1    A2    A3 New_Col$A1  $A2         $A3        
  <dbl> <dbl> <dbl> <dbl> <chr>       <chr>       <chr>      
1     1     0     1     1 NA          colnames(.) colnames(.)
2     2     0     1     1 NA          colnames(.) colnames(.)
3     3     1     1     1 colnames(.) colnames(.) colnames(.)
4     4     1     0     1 colnames(.) NA          colnames(.)
5     5     0     1     0 NA          colnames(.) NA

What I want to learn:

Is it possible to use across to check for conditions across multiple columns
If yes how looks the part after ~ of case_when to get the specific colnames
How can I get only one column after using mutate, across and case_when and not 3 like here.

I thought I already was able to master this task, but somehow I lost it...

score 10 · Accepted Answer · answered May 30 '21 at 11:12

10

To use across with case_when you can do -

library(dplyr)
library(tidyr)

df %>% 
  mutate(across(A1:A3, ~case_when(. == 1 ~ cur_column()), .names = 'new_{col}')) %>%
  unite(New_Col, starts_with('new'), na.rm = TRUE, sep = ' ')

#    ID    A1    A2    A3 New_Col 
#  <dbl> <dbl> <dbl> <dbl> <chr>   
#1     1     0     1     1 A2 A3   
#2     2     0     1     1 A2 A3   
#3     3     1     1     1 A1 A2 A3
#4     4     1     0     1 A1 A3   
#5     5     0     1     0 A2

across creates 3 new columns named new_A1, new_A2 and new_A3 with the column name if the value is 1 or NA otherwise. Using unite we combine the 3 columns into one New_col.

Also we can use rowwise with c_across -

df %>% 
  rowwise() %>% 
  mutate(New_Col = paste0(names(.[-1])[c_across(A1:A3) == 1], collapse = ' '))

answered May 30 '21 at 11:12

Ronak Shah

377,200
20
156
213

Ronak, instead of names() can we use cur_column here somehow directly? – AnilGoyal May 30 '21 at 14:03
You mean in `rowwise` or `group_by` `ID` right? I don't think we can do that since `cur_column` can be used within `across` only. – Ronak Shah May 30 '21 at 14:08
Yes, It returns this error only. Thanks for explaining :) – AnilGoyal May 30 '21 at 14:10

score 7 · Answer 2 · answered May 30 '21 at 11:17

7

without rowwise/ across you may also obtain same using cur_data()

df %>% group_by(ID) %>%
  mutate(new_col = paste0(names(df[-1])[as.logical(cur_data())], collapse = ' '))

# A tibble: 5 x 5
# Groups:   ID [5]
     ID    A1    A2    A3 new_col 
  <dbl> <dbl> <dbl> <dbl> <chr>   
1     1     0     1     1 A2 A3   
2     2     0     1     1 A2 A3   
3     3     1     1     1 A1 A2 A3
4     4     1     0     1 A1 A3   
5     5     0     1     0 A2

a . instead of df inside mutate will also do

df %>% group_by(ID) %>%
  mutate(new_col = paste0(names(.[-1])[as.logical(cur_data())], collapse = ' '))

answered May 30 '21 at 11:17

AnilGoyal

25,297
4
27
45

1

Awesome Anil ji and Ronak, Have one query, here `cur_data` is each group, will it work even if there are more than 1 row for each group? Because I tried `as.logical(df[-1])` and expecting a DF of `TRUE` and `FALSE` but got this error: `Error: 'list' object cannot be coerced to type 'logical'`. And what's the difference between `cur_data` and `cur_group` – Karthik S May 30 '21 at 12:03
1

Hi @KarthikS, you may call me Anil, see some explanation [here](https://dplyr.tidyverse.org/reference/context.html). `cur_data` returns the current data (grouped of course) and `cur_group` represents group keys. So `cur_data` will return binary values here and `cur_group` will return ids. Hope this is clear – AnilGoyal May 30 '21 at 12:19

akrun · Answer 3 · 2021-05-30T19:57:43.333

4

Using base R

df$New_Col <- apply(df[-1], 1, \(x) paste(names(x)[as.logical(x)], collapse=' '))
df$New_Col
#[1] "A2 A3"    "A2 A3"    "A1 A2 A3" "A1 A3"    "A2"

Or using tidyverse

library(dplyr)
library(purrr)
library(stringr)
df %>%
   mutate(New_Col = across(A1:A3, ~ c('', cur_column())[. + 1] ) %>% 
                       invoke(str_c, .))

edited May 30 '21 at 19:57

answered May 30 '21 at 19:43

akrun

874,273
37
540
662

score 3 · Answer 4 · answered May 30 '21 at 11:18

One option involving also purrr could be:

df %>%
 mutate(New_Col = pmap_chr(across(-ID), 
                           ~ paste(names(c(...))[which(c(...) == 1)], collapse = " ")))

     ID    A1    A2    A3 New_Col 
  <dbl> <dbl> <dbl> <dbl> <chr>   
1     1     0     1     1 A2 A3   
2     2     0     1     1 A2 A3   
3     3     1     1     1 A1 A2 A3
4     4     1     0     1 A1 A3   
5     5     0     1     0 A2

Concatenate column names in one column conditional on using mutate, across and case_when

4 Answers4

Linked