0

Question

In R, can I used a vector that holds the names of data frame columns to avoid repeated code?

vec_columns <- c("col1", "col2", "col8", "col54")

Background

I started off looking to solve the problem that is answered in this question, which works on its own terms: Coalesce columns and create another column to specify source

But in my specific use case, I have many columns and the ones that I want to coalesce() are not adjacent to each other, so the <tidy-select> used in that solution doesn't work for me.

Modified Example from Original Question

In the original question, the OP was using contiguous columns that began in column #1, but I have leading columns that are not part of the coalesce(), plus the columns I want to coalesce() are separated from each other.

df_2 <-
  data.frame(
    Name = c("A", "B", "C", "D", "E"),  #Adding a name column not in original posted question
    group_1 = c(NA, NA, NA, NA, 2),
    group_2 = c(NA, 4, NA, NA, 1),
    group_3 = c(NA, NA, 5, NA, NA),
    group_4 = c(1, NA, NA, 2, NA),
    group_5 = c(NA, 3, NA, NA, NA)
  )

> df_2
  Name group_1 group_2 group_3 group_4 group_5
1    A      NA      NA      NA       1      NA
2    B      NA       4      NA      NA       3
3    C      NA      NA       5      NA      NA
4    D      NA      NA      NA       2      NA
5    E       2       1      NA      NA      NA

This solution below I created and does exactly what I want for output:

df_2 %>% 
 mutate(one_col = coalesce(group_2, group_3, group_5)) %>% 
 rowwise() %>%
 mutate(group_col = c("group_2", "group_3", "group_5")[!is.na(c_across(c(group_2, group_3, group_5)))][1])

# A tibble: 5 x 8
# Rowwise: 
  Name  group_1 group_2 group_3 group_4 group_5 one_col group_col
  <chr>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl> <chr>    
1 A          NA      NA      NA       1      NA      NA NA       
2 B          NA       4      NA      NA       3       4 group_2  
3 C          NA      NA       5      NA      NA       5 group_3  
4 D          NA      NA      NA       2      NA      NA NA       
5 E           2       1      NA      NA      NA       1 group_2  

The Problem

But as you can see, I have to repeat those column names 3 times. Future proofing myself, I envision where I might want or 10 columns in the coalesce(). I want to just set a variable holding a vector of column names once, but doing the obvious myvec <- c("group_2", "group_3", "group_5") and inserting it doesn't work.

EDIT

Reading the comments, I got to an answer that meets the need, but I'm reluctant to answer it myself because credit goes to the commenter(s).

This achieves what I wanted:

myvec <- c("group_2", "group_3","group_5")

df_2 %>% 
  mutate(one_col = coalesce(!!!select(.,myvec))) %>% rowwise() %>% 
  mutate(group_col = myvec[!is.na(c_across(myvec))][1])

# A tibble: 5 x 8
# Rowwise: 
  Name  group_1 group_2 group_3 group_4 group_5 one_col group_col
  <chr>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl> <chr>    
1 A          NA      NA      NA       1      NA      NA NA       
2 B          NA       4      NA      NA       3       4 group_2  
3 C          NA      NA       5      NA      NA       5 group_3  
4 D          NA      NA      NA       2      NA      NA NA       
5 E           2       1      NA      NA      NA       1 group_2  

The main goal here is really to get the group_col column which records which column was used in the coalesce(), but with the flexibility to handle non-adjacent columns, and also to change the order of the columns in the coalesce(). For example, below I reverse the order of the myvec so that row 2, "B" selects 3 and group_5, instead of 4 and group_2 in the above example.

> myvec2 <- c("group_5", "group_3","group_2")
> df_2 %>% 
+   mutate(one_col = coalesce(!!!select(.,myvec2))) %>% rowwise() %>% 
+   mutate(group_col = myvec2[!is.na(c_across(myvec2))][1])
Note: Using an external vector in selections is ambiguous.
i Use `all_of(myvec2)` instead of `myvec2` to silence this message.
i See <https://tidyselect.r-lib.org/reference/faq-external-vector.html>.
This message is displayed once per session.
# A tibble: 5 x 8
# Rowwise: 
  Name  group_1 group_2 group_3 group_4 group_5 one_col group_col
  <chr>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl> <chr>    
1 A          NA      NA      NA       1      NA      NA NA       
2 B          NA       4      NA      NA       3       3 group_5  
3 C          NA      NA       5      NA      NA       5 group_3  
4 D          NA      NA      NA       2      NA      NA NA       
5 E           2       1      NA      NA      NA       1 group_2  

I am mildly concerned about the message I received about external vector in selections is ambiguous, so perhaps I'll need to use that all_of() function, but this did work.

ScottyJ
  • 945
  • 11
  • 16
  • But you might show how you attempted `myvec`, as it might not be obvious, nor work. And inexplicably, lead to a solution. – Chris May 06 '22 at 05:02
  • Does this answer your question? [Coalesce columns based on pattern in R](https://stackoverflow.com/questions/56776187/coalesce-columns-based-on-pattern-in-r) – benson23 May 06 '22 at 05:25
  • Something like `mutate(one_col = coalesce(!!!select(., myvec)))`, and replace subsequent column names with `myvec` – benson23 May 06 '22 at 05:26
  • @benson23 It's not so much about the `coalesce()` itself as how to handle the vector in the three different places as they seem to require being used three different ways. I was able to use your suggestion above and I got this to provide the output I want: `df_2 %>% mutate(one_col = coalesce(!!!select(.,myvec))) %>% rowwise() %>% mutate(group_col = myvec[!is.na(c_across(myvec))][1])` I think part of my problem is that I see this`!!!` operator and I really don't know what it is and Google search for "r operator !!!" isn't helpful for me. – ScottyJ May 07 '22 at 15:18
  • see last example in ?dplyr::coalesce, as searching for ?`!!!` did nothing for me, but in words, combine like objects (in that instance vectors) by members that aren't NA, or so I'm guessing. – Chris May 07 '22 at 15:38
  • Ah, looking in the wrong place, '!!!' is rlang ?rlang::`!!!`. – Chris May 07 '22 at 16:02
  • @Chris Okay, I found it. The "splice-operator" in rlang. I've seen it, but am just trying to figure out how it works now. – ScottyJ May 07 '22 at 17:13
  • Qualify (would this lead to reasonably useful stuff?), then splice; though one can imagine an alternate universe where spliced NA is desirable. – Chris May 07 '22 at 18:35

0 Answers0